目的传统显著性检测模型大多利用手工选择的中低层特征和先验信息进行物体检测,其准确率和召回率较低,随着深度卷积神经网络的兴起,显著性检测得以快速发展。然而,现有显著性方法仍存在共性缺点,难以在复杂图像中均匀地突显整个物体的...目的传统显著性检测模型大多利用手工选择的中低层特征和先验信息进行物体检测,其准确率和召回率较低,随着深度卷积神经网络的兴起,显著性检测得以快速发展。然而,现有显著性方法仍存在共性缺点,难以在复杂图像中均匀地突显整个物体的明确边界和内部区域,主要原因是缺乏足够且丰富的特征用于检测。方法在VGG(visual geometry group)模型的基础上进行改进,去掉最后的全连接层,采用跳层连接的方式用于像素级别的显著性预测,可以有效结合来自卷积神经网络不同卷积层的多尺度信息。此外,它能够在数据驱动的框架中结合高级语义信息和低层细节信息。为了有效地保留物体边界和内部区域的统一,采用全连接的条件随机场(conditional random field,CRF)模型对得到的显著性特征图进行调整。结果本文在6个广泛使用的公开数据集DUT-OMRON(Dalian University of Technology and OMRON Corporation)、ECSSD(extended complex scene saliency dataset)、SED2(segmentation evalution database 2)、HKU、PASCAL-S和SOD(salient objects dataset)上进行了测试,并就准确率—召回率(precision-recall,PR)曲线、F测度值(F-measure)、最大F测度值、加权F测度值和均方误差(mean absolute error,MAE)等性能评估指标与14种最先进且具有代表性的方法进行比较。结果显示,本文方法在6个数据集上的F测度值分别为0.696、0.876、0.797、0.868、0.772和0.785;最大F测度值分别为0.747、0.899、0.859、0.889、0.814和0.833;加权F测度值分别为0.656、0.854、0.772、0.844、0.732和0.762;MAE值分别为0.074、0.061、0.093、0.049、0.099和0.124。无论是前景和背景颜色相似的图像集,还是多物体的复杂图像集,本文方法的各项性能均接近最新研究成果,且优于大多数具有代表性的方法。结论本文方法对各种场景的图像显著性检测都具有较强的鲁棒性,同时可以使显著性物体的边界和内部�展开更多
Salient object detection(SOD)is a long-standing research topic in computer vision with increasing interest in the past decade.Since light fields record comprehensive information of natural scenes that benefit SOD in a...Salient object detection(SOD)is a long-standing research topic in computer vision with increasing interest in the past decade.Since light fields record comprehensive information of natural scenes that benefit SOD in a number of ways,using light field inputs to improve saliency detection over conventional RGB inputs is an emerging trend.This paper provides the first comprehensive review and a benchmark for light field SOD,which has long been lacking in the saliency community.Firstly,we introduce light fields,including theory and data forms,and then review existing studies on light field SOD,covering ten traditional models,seven deep learning-based models,a comparative study,and a brief review.Existing datasets for light field SOD are also summarized.Secondly,we benchmark nine representative light field SOD models together with several cutting-edge RGB-D SOD models on four widely used light field datasets,providing insightful discussions and analyses,including a comparison between light field SOD and RGB-D SOD models.Due to the inconsistency of current datasets,we further generate complete data and supplement focal stacks,depth maps,and multi-view images for them,making them consistent and uniform.Our supplemental data make a universal benchmark possible.Lastly,light field SOD is a specialised problem,because of its diverse data representations and high dependency on acquisition hardware,so it differs greatly from other saliency detection tasks.We provide nine observations on challenges and future directions,and outline several open issues.All the materials including models,datasets,benchmarking results,and supplemented light field datasets are publicly available at https://github.com/kerenfu/LFSOD-Survey.展开更多
Salient object detection(SOD)in RGB and depth images has attracted increasing research interest.Existing RGB-D SOD models usually adopt fusion strategies to learn a shared representation from RGB and depth modalities,...Salient object detection(SOD)in RGB and depth images has attracted increasing research interest.Existing RGB-D SOD models usually adopt fusion strategies to learn a shared representation from RGB and depth modalities,while few methods explicitly consider how to preserve modality-specific characteristics.In this study,we propose a novel framework,the specificity-preserving network(SPNet),which improves SOD performance by exploring both the shared information and modality-specific properties.Specifically,we use two modality-specific networks and a shared learning network to generate individual and shared saliency prediction maps.To effectively fuse cross-modal features in the shared learning network,we propose a cross-enhanced integration module(CIM)and propagate the fused feature to the next layer to integrate cross-level information.Moreover,to capture rich complementary multi-modal information to boost SOD performance,we use a multi-modal feature aggregation(MFA)module to integrate the modalityspecific features from each individual decoder into the shared decoder.By using skip connections between encoder and decoder layers,hierarchical features can be fully combined.Extensive experiments demonstrate that our SPNet outperforms cutting-edge approaches on six popular RGB-D SOD and three camouflaged object detection benchmarks.The project is publicly available at https://github.com/taozh2017/SPNet.展开更多
Previous video object segmentation approachesmainly focus on simplex solutions linking appearance and motion,limiting effective feature collaboration between these two cues.In this work,we study a novel and efficient ...Previous video object segmentation approachesmainly focus on simplex solutions linking appearance and motion,limiting effective feature collaboration between these two cues.In this work,we study a novel and efficient full-duplex strategy network(FSNet)to address this issue,by considering a better mutual restraint scheme linking motion and appearance allowing exploitation of cross-modal features from the fusion and decoding stage.Specifically,we introduce a relational cross-attention module(RCAM)to achieve bidirectional message propagation across embedding sub-spaces.To improve the model’s robustness and update inconsistent features from the spatiotemporal embeddings,we adopt a bidirectional purification module after the RCAM.Extensive experiments on five popular benchmarks show that our FSNet is robust to various challenging scenarios(e.g.,motion blur and occlusion),and compares well to leading methods both for video object segmentation and video salient object detection.The project is publicly available at https://github.com/GewelsJI/FSNet.展开更多
文摘目的传统显著性检测模型大多利用手工选择的中低层特征和先验信息进行物体检测,其准确率和召回率较低,随着深度卷积神经网络的兴起,显著性检测得以快速发展。然而,现有显著性方法仍存在共性缺点,难以在复杂图像中均匀地突显整个物体的明确边界和内部区域,主要原因是缺乏足够且丰富的特征用于检测。方法在VGG(visual geometry group)模型的基础上进行改进,去掉最后的全连接层,采用跳层连接的方式用于像素级别的显著性预测,可以有效结合来自卷积神经网络不同卷积层的多尺度信息。此外,它能够在数据驱动的框架中结合高级语义信息和低层细节信息。为了有效地保留物体边界和内部区域的统一,采用全连接的条件随机场(conditional random field,CRF)模型对得到的显著性特征图进行调整。结果本文在6个广泛使用的公开数据集DUT-OMRON(Dalian University of Technology and OMRON Corporation)、ECSSD(extended complex scene saliency dataset)、SED2(segmentation evalution database 2)、HKU、PASCAL-S和SOD(salient objects dataset)上进行了测试,并就准确率—召回率(precision-recall,PR)曲线、F测度值(F-measure)、最大F测度值、加权F测度值和均方误差(mean absolute error,MAE)等性能评估指标与14种最先进且具有代表性的方法进行比较。结果显示,本文方法在6个数据集上的F测度值分别为0.696、0.876、0.797、0.868、0.772和0.785;最大F测度值分别为0.747、0.899、0.859、0.889、0.814和0.833;加权F测度值分别为0.656、0.854、0.772、0.844、0.732和0.762;MAE值分别为0.074、0.061、0.093、0.049、0.099和0.124。无论是前景和背景颜色相似的图像集,还是多物体的复杂图像集,本文方法的各项性能均接近最新研究成果,且优于大多数具有代表性的方法。结论本文方法对各种场景的图像显著性检测都具有较强的鲁棒性,同时可以使显著性物体的边界和内部�
基金supported by the National Natural Science Foundation of China(Nos.62176169 and 61703077)SCU-Luzhou Municipal People's Government Strategic Cooperation Projetc(t No.2020CDLZ-10)+1 种基金supported by the National Natural Science Foundation of China(No.62172228)supported by the National Natural Science Foundation of China(No.61773270).
文摘Salient object detection(SOD)is a long-standing research topic in computer vision with increasing interest in the past decade.Since light fields record comprehensive information of natural scenes that benefit SOD in a number of ways,using light field inputs to improve saliency detection over conventional RGB inputs is an emerging trend.This paper provides the first comprehensive review and a benchmark for light field SOD,which has long been lacking in the saliency community.Firstly,we introduce light fields,including theory and data forms,and then review existing studies on light field SOD,covering ten traditional models,seven deep learning-based models,a comparative study,and a brief review.Existing datasets for light field SOD are also summarized.Secondly,we benchmark nine representative light field SOD models together with several cutting-edge RGB-D SOD models on four widely used light field datasets,providing insightful discussions and analyses,including a comparison between light field SOD and RGB-D SOD models.Due to the inconsistency of current datasets,we further generate complete data and supplement focal stacks,depth maps,and multi-view images for them,making them consistent and uniform.Our supplemental data make a universal benchmark possible.Lastly,light field SOD is a specialised problem,because of its diverse data representations and high dependency on acquisition hardware,so it differs greatly from other saliency detection tasks.We provide nine observations on challenges and future directions,and outline several open issues.All the materials including models,datasets,benchmarking results,and supplemented light field datasets are publicly available at https://github.com/kerenfu/LFSOD-Survey.
基金supported in part by the National Natural Science Foundation of China under Grant No.62172228in part by an Open Project of the Key Laboratory of System Control and Information Processing,Ministry of Education(Shanghai Jiao Tong University,No.Scip202102).
文摘Salient object detection(SOD)in RGB and depth images has attracted increasing research interest.Existing RGB-D SOD models usually adopt fusion strategies to learn a shared representation from RGB and depth modalities,while few methods explicitly consider how to preserve modality-specific characteristics.In this study,we propose a novel framework,the specificity-preserving network(SPNet),which improves SOD performance by exploring both the shared information and modality-specific properties.Specifically,we use two modality-specific networks and a shared learning network to generate individual and shared saliency prediction maps.To effectively fuse cross-modal features in the shared learning network,we propose a cross-enhanced integration module(CIM)and propagate the fused feature to the next layer to integrate cross-level information.Moreover,to capture rich complementary multi-modal information to boost SOD performance,we use a multi-modal feature aggregation(MFA)module to integrate the modalityspecific features from each individual decoder into the shared decoder.By using skip connections between encoder and decoder layers,hierarchical features can be fully combined.Extensive experiments demonstrate that our SPNet outperforms cutting-edge approaches on six popular RGB-D SOD and three camouflaged object detection benchmarks.The project is publicly available at https://github.com/taozh2017/SPNet.
基金This work was supported by the National Natural Science Foundation of China(62176169,61703077,and 62102207).
文摘Previous video object segmentation approachesmainly focus on simplex solutions linking appearance and motion,limiting effective feature collaboration between these two cues.In this work,we study a novel and efficient full-duplex strategy network(FSNet)to address this issue,by considering a better mutual restraint scheme linking motion and appearance allowing exploitation of cross-modal features from the fusion and decoding stage.Specifically,we introduce a relational cross-attention module(RCAM)to achieve bidirectional message propagation across embedding sub-spaces.To improve the model’s robustness and update inconsistent features from the spatiotemporal embeddings,we adopt a bidirectional purification module after the RCAM.Extensive experiments on five popular benchmarks show that our FSNet is robust to various challenging scenarios(e.g.,motion blur and occlusion),and compares well to leading methods both for video object segmentation and video salient object detection.The project is publicly available at https://github.com/GewelsJI/FSNet.