Semantic segmentation has recently witnessed rapid progress, but existing methods only focus on identifying objects or instances. In this work, we aim to address the task of semantic understanding of scenes with deep ...Semantic segmentation has recently witnessed rapid progress, but existing methods only focus on identifying objects or instances. In this work, we aim to address the task of semantic understanding of scenes with deep learning. Different from many existing methods, our method focuses on putting forward some techniques to improve the existing algorithms, rather than to propose a whole new framework. Objectness enhancement is the first effective technique. It exploits the detection module to produce object region proposals with category probability, and these regions are used to weight the parsing feature map directly. 'Extra background' category, as a specific category, is often attached to the category space for improving parsing result in semantic and instance segmentation tasks. In scene parsing tasks, extra background category is still beneficial to improve the model in training. However, some pixels may be assigned into this nonexistent category in inference. Black-hole filling technique is proposed to avoid the incorrect classification. For verifying these two techniques, we integrate them into a parsing framework for generating parsing result. We call this unified framework as Objectness Enhancement Network (OENet). Compared with previous work, our proposed OENet system effectively improves the performance over the original model on SceneParse150 scene parsing dataset, reaching 38.4 mIoU (mean intersection-over-union) and 77.9% accuracy in the validation set without assembling multiple models. Its effectiveness is also verified on the Cityscapes dataset.展开更多
【目的】为了使室内机器人能准确地识别室内不同类别的物体,从而选择更安全可行的路线,提出一种用于室内场景解析的基于自蒸馏和双模态的自蒸馏多级级联网络(self-distillation multi-stage cascaded network,SMCNet)。【方法】首先,使...【目的】为了使室内机器人能准确地识别室内不同类别的物体,从而选择更安全可行的路线,提出一种用于室内场景解析的基于自蒸馏和双模态的自蒸馏多级级联网络(self-distillation multi-stage cascaded network,SMCNet)。【方法】首先,使用分割变换器(segmentation transformer,SegFormer)作为骨干网络以双流的方式分别提取三色图(red green blue,RGB)和深度图中的特征信息,得到4组特征输出;其次,设计了特征增强模块(feature enhancement module,FEM),将这四组特征进行特征增强后分组融合,以充分提取双模态特征中的有用信息并充分交融;最后,设计了自蒸馏监督模块(self-distillation supervision module,SSM),通过自蒸馏方法将高层特征中的有价值信息传递到低层特征中,并设计了多级级联监督模块(multi-stage cascaded supervision module,MCSM)进行跨层监督,得到最终的预测图。【结果】在室内场景双模态数据集纽约大学深度版本2(New York University Depth version 2,NYUDv2)和场景理解彩色-深度(scene understanding red green blue-depth,SUN RGB-D)上,相比已有的方法,本研究提出的模型在相同条件下得到的结果超过其他方法,均值交并比(mean intersection over union,MIoU)在NYUDv2和SUN RGB-D两个数据集上分别达到了57.3%和53.1%。【结论】SMCNet能比较准确地解析出室内场景中不同类别的物体,可为室内机器人获取室内视觉信息提供一定的技术支撑。展开更多
针对短时间主动热激励作用下煤岩介质表征差异不明显,不易快速、准确识别煤岩界面的难题,提出一种基于改进金字塔场景解析网络(pyramid scene parsing network,简称PSPnet)模型-MobileNetV2的煤岩界面快速精准识别方法。通过搭建煤岩主...针对短时间主动热激励作用下煤岩介质表征差异不明显,不易快速、准确识别煤岩界面的难题,提出一种基于改进金字塔场景解析网络(pyramid scene parsing network,简称PSPnet)模型-MobileNetV2的煤岩界面快速精准识别方法。通过搭建煤岩主动红外试验平台,采集并获取短时主动热激励作用下的煤岩界面红外热图像,构建了煤岩红外图像数据集;对传统PSPnet模型进行改进,采用轻量级网络模型MobileNetV2作为主干网络提取特征,大幅降低了网络模型所占内存和训练时间,同时将注意力机制模块(convolutional block attention module,简称CBAM)与金字塔场景解析(pyramid scene parsing,简称PSP)模块的上采样特征层和PSPnet网络模型的浅层特征层进行融合,有效提升模型对特征的细化能力。试验结果表明:基于改进的PSPnet-MobileNetV2网络模型所占内存仅为9.12 MB,较原始PSPnet模型减少了94.88%;煤和岩的交并比为96.52%和96.87%,分别提升了8.29%和7.7%;像素准确度分别为97.25%和99.15%,较原始网络模型分别提升了7.32%和1.64%;测试时间降低了53.70%。该方法为煤岩界面的快速和预先精准识别提供了一种有效技术手段。展开更多
文摘Semantic segmentation has recently witnessed rapid progress, but existing methods only focus on identifying objects or instances. In this work, we aim to address the task of semantic understanding of scenes with deep learning. Different from many existing methods, our method focuses on putting forward some techniques to improve the existing algorithms, rather than to propose a whole new framework. Objectness enhancement is the first effective technique. It exploits the detection module to produce object region proposals with category probability, and these regions are used to weight the parsing feature map directly. 'Extra background' category, as a specific category, is often attached to the category space for improving parsing result in semantic and instance segmentation tasks. In scene parsing tasks, extra background category is still beneficial to improve the model in training. However, some pixels may be assigned into this nonexistent category in inference. Black-hole filling technique is proposed to avoid the incorrect classification. For verifying these two techniques, we integrate them into a parsing framework for generating parsing result. We call this unified framework as Objectness Enhancement Network (OENet). Compared with previous work, our proposed OENet system effectively improves the performance over the original model on SceneParse150 scene parsing dataset, reaching 38.4 mIoU (mean intersection-over-union) and 77.9% accuracy in the validation set without assembling multiple models. Its effectiveness is also verified on the Cityscapes dataset.
文摘【目的】为了使室内机器人能准确地识别室内不同类别的物体,从而选择更安全可行的路线,提出一种用于室内场景解析的基于自蒸馏和双模态的自蒸馏多级级联网络(self-distillation multi-stage cascaded network,SMCNet)。【方法】首先,使用分割变换器(segmentation transformer,SegFormer)作为骨干网络以双流的方式分别提取三色图(red green blue,RGB)和深度图中的特征信息,得到4组特征输出;其次,设计了特征增强模块(feature enhancement module,FEM),将这四组特征进行特征增强后分组融合,以充分提取双模态特征中的有用信息并充分交融;最后,设计了自蒸馏监督模块(self-distillation supervision module,SSM),通过自蒸馏方法将高层特征中的有价值信息传递到低层特征中,并设计了多级级联监督模块(multi-stage cascaded supervision module,MCSM)进行跨层监督,得到最终的预测图。【结果】在室内场景双模态数据集纽约大学深度版本2(New York University Depth version 2,NYUDv2)和场景理解彩色-深度(scene understanding red green blue-depth,SUN RGB-D)上,相比已有的方法,本研究提出的模型在相同条件下得到的结果超过其他方法,均值交并比(mean intersection over union,MIoU)在NYUDv2和SUN RGB-D两个数据集上分别达到了57.3%和53.1%。【结论】SMCNet能比较准确地解析出室内场景中不同类别的物体,可为室内机器人获取室内视觉信息提供一定的技术支撑。
文摘针对短时间主动热激励作用下煤岩介质表征差异不明显,不易快速、准确识别煤岩界面的难题,提出一种基于改进金字塔场景解析网络(pyramid scene parsing network,简称PSPnet)模型-MobileNetV2的煤岩界面快速精准识别方法。通过搭建煤岩主动红外试验平台,采集并获取短时主动热激励作用下的煤岩界面红外热图像,构建了煤岩红外图像数据集;对传统PSPnet模型进行改进,采用轻量级网络模型MobileNetV2作为主干网络提取特征,大幅降低了网络模型所占内存和训练时间,同时将注意力机制模块(convolutional block attention module,简称CBAM)与金字塔场景解析(pyramid scene parsing,简称PSP)模块的上采样特征层和PSPnet网络模型的浅层特征层进行融合,有效提升模型对特征的细化能力。试验结果表明:基于改进的PSPnet-MobileNetV2网络模型所占内存仅为9.12 MB,较原始PSPnet模型减少了94.88%;煤和岩的交并比为96.52%和96.87%,分别提升了8.29%和7.7%;像素准确度分别为97.25%和99.15%,较原始网络模型分别提升了7.32%和1.64%;测试时间降低了53.70%。该方法为煤岩界面的快速和预先精准识别提供了一种有效技术手段。