摘要
目的 高分辨率遥感图像通常包含复杂的语义信息与易混淆的目标,对其语义分割是一项重要且具有挑战性的任务.基于DeepLab V3+网络结构,结合树形神经网络结构模块,设计出一种针对高分辨率遥感图像的语义分割网络.方法 提出的网络结构不仅对DeepLab V3+做出了修改,使其适用于多尺度、多模态的数据,而且在其后添加连接树形神经网络结构模块.树形结构通过建立混淆矩阵、提取混淆图、构建图分割,能够对易混淆的像素更好地区分,得到更准确的分割结果.结果 在国际摄影测量及遥感探测学会(International Society for Photo-grammetry and Remote Sensing,ISPRS)提供的两个不同城市的遥感影像集上分别进行了实验,模型在整体准确率(overall accuracy,OA)这一项表现最好,在Vaihingen和Potsdam数据集上分别达到了90.4%和90.7%,其整体分割准确率较其基准结果有10.3%和17.4%的提升,对比ISPRS官方网站上的3种先进方法也有显著提升.结论 提出结合DeepLab V3+和树形结构的卷积神经网络,有效提升了高分辨率遥感图像语义分割整体精度,其中易混淆类别数据的分割准确率显著提高.在包含复杂语义信息的高分辨率遥感图像中,由于易混淆类别之间的像素分割错误减少,使用了树形结构的网络模型的整体分割准确率也有较大提升.
Objective High-resolution remote sensing image segmentation refers to the task of assigning a semantic label to each pixel in an image. Recently,with the rapid development of remote sensing technology,we have been able to easily obtain very-high resolution remote sensing images with a ground sampling distance of 5 cm to 10 cm. However,the very heterogeneous appearance of objects,such as buildings,streets,trees,and cars,in very-high-resolution data makes this task challenging,leading to high intraclass variance while the inter-class variance is low. A research hotspot is on detailed 2D semantic segmentation that assigns labels to multiple object categories. Traditional image processing methods depend on the extraction technique of the vectorization model,for example,based on region segmentation,line analysis,and shadow analysis. Another mainstream study relies on supervised classifiers with manually designed features. These models were not generalized when dealing with high-resolution remote sensing images. Recently,deep learning-based technology has helped explore the high-level semantic information in imaged and provide an end-to-end approach for semantic segmentation.Method Based on Deep Lab V3+,we proposed an adaptive constructed neural network,which contains two connected modules,namely,the segmentation module and the tree module. When segmenting remote-sensing images,which contain multiscale objects,understanding the context is important. To handle the problem of segmenting objects at multiple scales,Deep Lab V3+ employs atrous convolution in cascade or in parallel captures multiscale context by adopting multiple atrous rates. We adopted a similar idea in designing the segmentation module. This module uses an encoder-decoder architecture.The encoder is composed of four structures: Entry Flow,MiddleFlow,Exit Flow,and atrous spatial pyramid pooling (ASPP). In addition,the decoder is composed of two layers of Separable Conv blocks. The middle flow has two Xception blocks,which are linear stacks of depth-separab
作者
胡伟
高博川
黄振航
李瑞瑞
Hu Wei;Gao Bochuan;Huang Zhenhang;Li Ruirui(College of Information Science and Technology,Beijing University of Chemical Technology,Beijing 100029,China)
出处
《中国图象图形学报》
CSCD
北大核心
2020年第5期1043-1052,共10页
Journal of Image and Graphics
关键词
卷积神经网络
遥感图像
语义分割
树形结构
DeepLab
V3+
convolutional neural networks(CNN)
remote sensing images
semantic segmentation
tree like structure
DeepLab V3+