目的在点云场景中,语义分割对场景理解来说是至关重要的视觉任务。由于图像是结构化的,而点云是非结构化的,点云上的卷积通常比图像上的卷积更加困难,会消耗更多的计算和内存资源。在这种情况下,大尺度场景的分割往往需要分块进行,导致...目的在点云场景中,语义分割对场景理解来说是至关重要的视觉任务。由于图像是结构化的,而点云是非结构化的,点云上的卷积通常比图像上的卷积更加困难,会消耗更多的计算和内存资源。在这种情况下,大尺度场景的分割往往需要分块进行,导致效率不足并且无法捕捉足够的场景信息。为了解决这个问题,本文设计了一种计算高效且内存高效的网络结构,可以用于端到端的大尺度场景语义分割。方法结合空间深度卷积和残差结构设计空间深度残差(spatial depthwise residual,SDR)块,其具有高效的计算效率和内存效率,并且可以有效地从点云中学习到几何特征。另外,设计一种扩张特征整合(dilated feature aggregation,DFA)模块,可以有效地增加感受野而仅增加少量的计算量。结合SDR块和DFA模块,本文构建SDRNet(spatial depthwise residual network),这是一种encoder-decoder深度网络结构,可以用于大尺度点云场景语义分割。同时,针对空间卷积核输入数据的分布不利于训练问题,提出层级标准化来减小参数学习的难度。特别地,针对稀疏雷达点云的旋转不变性,提出一种特殊的SDR块,可以消除雷达数据绕Z轴旋转的影响,显著提高网络处理激光雷达点云时的性能。结果在S3DIS(stanford large-scale 3D indoor space)和Semantic KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute)数据集上对提出的方法进行测试,并分析点数与帧率的关系。本文方法在S3DIS数据集上的平均交并比(mean intersection over union,mIoU)为71.7%,在Semantic KITTI上的m Io U在线单次扫描评估中达到59.1%。结论实验结果表明,本文提出的SDRNet能够直接在大尺度场景下进行语义分割。在S3DIS和Semantic KITTI数据集上的实验结果证明本文方法在精度上有较好表现。通过分析点数量与帧率之间的关系,得到的数据表明本文提出的SDRNet能保持较高精度和较�展开更多
Event-based cameras generate sparse event streams and capture high-speed motion information,however,as the time resolution increases,the spatial resolution will decrease sharply.Although the generative adversarial net...Event-based cameras generate sparse event streams and capture high-speed motion information,however,as the time resolution increases,the spatial resolution will decrease sharply.Although the generative adversarial network has achieved remarkable results in traditional image restoration,directly using it for event inpainting will obscure the fast response characteristics of the event camera,and the sparsity of the event stream is not fully utilized.To tackle the challenges,an event-inpainting network is proposed.The number and structure of the network are redesigned to adapt to the sparsity of events,and the dimensionality of the convolution is increased to retain more spatiotemporal information.To ensure the time consistency of the inpainting image,an event sequence discriminator is added.The tests on the DHP19 and MVSEC datasets were performed.Compared with the state-of-the-art traditional image inpainting method,the method in this paper reduces the number of parameters by 93.5% and increases the inference speed by 6 times without reducing the quality of the restored image too much.In addition,the human pose estimation experiment also revealed that this model can fill in human motion information in high frame rate scenes.展开更多
文摘目的在点云场景中,语义分割对场景理解来说是至关重要的视觉任务。由于图像是结构化的,而点云是非结构化的,点云上的卷积通常比图像上的卷积更加困难,会消耗更多的计算和内存资源。在这种情况下,大尺度场景的分割往往需要分块进行,导致效率不足并且无法捕捉足够的场景信息。为了解决这个问题,本文设计了一种计算高效且内存高效的网络结构,可以用于端到端的大尺度场景语义分割。方法结合空间深度卷积和残差结构设计空间深度残差(spatial depthwise residual,SDR)块,其具有高效的计算效率和内存效率,并且可以有效地从点云中学习到几何特征。另外,设计一种扩张特征整合(dilated feature aggregation,DFA)模块,可以有效地增加感受野而仅增加少量的计算量。结合SDR块和DFA模块,本文构建SDRNet(spatial depthwise residual network),这是一种encoder-decoder深度网络结构,可以用于大尺度点云场景语义分割。同时,针对空间卷积核输入数据的分布不利于训练问题,提出层级标准化来减小参数学习的难度。特别地,针对稀疏雷达点云的旋转不变性,提出一种特殊的SDR块,可以消除雷达数据绕Z轴旋转的影响,显著提高网络处理激光雷达点云时的性能。结果在S3DIS(stanford large-scale 3D indoor space)和Semantic KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute)数据集上对提出的方法进行测试,并分析点数与帧率的关系。本文方法在S3DIS数据集上的平均交并比(mean intersection over union,mIoU)为71.7%,在Semantic KITTI上的m Io U在线单次扫描评估中达到59.1%。结论实验结果表明,本文提出的SDRNet能够直接在大尺度场景下进行语义分割。在S3DIS和Semantic KITTI数据集上的实验结果证明本文方法在精度上有较好表现。通过分析点数量与帧率之间的关系,得到的数据表明本文提出的SDRNet能保持较高精度和较�
基金supported by the National Key Research and Development Program of China(No.2018YFB1305200)the Science Technology Department of Zhejiang Province(No.LGG19F020010)。
文摘Event-based cameras generate sparse event streams and capture high-speed motion information,however,as the time resolution increases,the spatial resolution will decrease sharply.Although the generative adversarial network has achieved remarkable results in traditional image restoration,directly using it for event inpainting will obscure the fast response characteristics of the event camera,and the sparsity of the event stream is not fully utilized.To tackle the challenges,an event-inpainting network is proposed.The number and structure of the network are redesigned to adapt to the sparsity of events,and the dimensionality of the convolution is increased to retain more spatiotemporal information.To ensure the time consistency of the inpainting image,an event sequence discriminator is added.The tests on the DHP19 and MVSEC datasets were performed.Compared with the state-of-the-art traditional image inpainting method,the method in this paper reduces the number of parameters by 93.5% and increases the inference speed by 6 times without reducing the quality of the restored image too much.In addition,the human pose estimation experiment also revealed that this model can fill in human motion information in high frame rate scenes.