摘要
合成孔径雷达(SAR)舰船检测是近年来的研究热点。然而,与光学图像不同,SAR成像的特点会导致不直观的特征表示。此外,由于SAR图像数据量不足,现有的基于大量标记SAR图像的方法可能难以达到较好的检测效果。为了解决这些问题,本文提出了一种基于多级跨模态对齐的SAR图像舰船检测算法MCMA-Net(Multi-level Cross-Modality Alignment Network),通过将光学模态中丰富的知识迁移到SAR模态来增强SAR图像的特征表示。该算法首先设计了一个基于邻域—全局注意力的特征交互网络NGAN(Neighborhood-Global Attention Network),通过对骨干网络的浅层特征采用邻域注意力机制进行局部交互、对深层特征采取全局自注意力机制进行全局上下文交互,在兼顾全局上下文建模能力的同时,提升局部特征的编码能力,使得网络在不同层级更合理的关注相应的信息,从而能够促进后续的多级别模态对齐。其次,本文设计了一个多级模态对齐模块MLMA(Multi-level Modality Alignment),通过从局部级别到全局级别再到实例级别的对两种模态不同隐含空间中的特征进行对齐,促进模型有效地学习模态不变特征,缓解了光学图像和SAR图像之间的模态鸿沟,实现了从光学模态到SAR模态的知识传输。大量的实验证明我们的算法优于现阶段的检测算法,取得了最好的实验结果。
In recent years,interest in Synthetic Aperture Radar(SAR)ship detection has considerably grown.Its distinctive strengths position it as a pivotal player in numerous fields of research.However,the inherent characteristics of SAR images have presented a range of challenges.For instance,in contrast to optical images,SAR images have counterintuitive feature representation.Additionally,owing to the constrained number of SAR image data,achieving satisfactory results with existing methods that depend on a substantial number of annotated SAR images might be challenging.How to effectively train a high-performance SAR ship detection network with a limited quantity of SAR images should be investigated.Given that single-modality SAR detection algorithms have inherent limitations,other effective modalities that can assist the SAR modality in completing tasks are needed.For instance,in SAR image target detection,optical images can serve as supplementary data sources.A knowledge-rich model can be developed by utilizing a large volume of optical data in training with SAR data.Hence,reasonable training approaches for effectively utilizing images from SAR and optical modalities should be explored.To address these challenges,a SAR ship detection algorithm called MCMA-Net,which is based on multilevel cross-modality alignment,is proposed in this paper.The MCMA-Net enriches SAR feature representation by incorporating valuable knowledge from optical modality.First,we propose a neighborhood–global attention-based feature interaction network(NGAN),which employs a neighborhood attention mechanism that enables the local interaction of low-level features and a global self-attention mechanism that captures global context from high-level features.When the ability of global context modeling is considered,the encoding ability of local features improves,NGAN enables the network to focus on corresponding information at different levels and can promote the subsequent multilevel modality alignment.Second,we propose a multilevel modality alignment
作者
何佳月
宿南
徐从安
尹璐
廖艳苹
闫奕名
HE Jiayue;SU Nan;XU Cong’an;YIN Lu;LIAO Yanping;YAN Yiming(College of Information and Communication Engineering,Harbin Engineering University,Harbin 150001,China;Research Institute of Information Fusion,Naval Aviation University,Yantai 264001,China;Beijing Institute of Remote Sensing Information,Beijing 100192,China)
出处
《遥感学报》
EI
CSCD
北大核心
2024年第7期1789-1801,共13页
NATIONAL REMOTE SENSING BULLETIN
基金
国家自然科学基金(编号:62271159,62071136,62002083,61971153)
黑龙江省优秀青年基金(编号:YQ2022F002)
黑龙江省博士后基金(编号:LBH-Q20085,LBH-Z20051)
中央高校基本科研业务费资金资助(编号:3072022QBZ0805,3072021CFT0801,3072022CF0808)
高分专项中俄边境地区国家安全监测及综合服务产业化示范(编号:72-Y50G11-9001-22/23)。
关键词
遥感
SAR
目标检测
跨模态
特征对齐
注意力机制
remote sensing
SAR
target detection
cross-modality
feature alignment
attention mechanism