Manhole cover defect recognition is of significant practical importance as it can accurately identify damaged or missing covers, enabling timely replacement and maintenance. Traditional manhole cover detection techniq...Manhole cover defect recognition is of significant practical importance as it can accurately identify damaged or missing covers, enabling timely replacement and maintenance. Traditional manhole cover detection techniques primarily focus on detecting the presence of covers rather than classifying the types of defects. However, manhole cover defects exhibit small inter-class feature differences and large intra-class feature variations, which makes their recognition challenging. To improve the classification of manhole cover defect types, we propose a Progressive Dual-Branch Feature Fusion Network (PDBFFN). The baseline backbone network adopts a multi-stage hierarchical architecture design using Res-Net50 as the visual feature extractor, from which both local and global information is obtained. Additionally, a Feature Enhancement Module (FEM) and a Fusion Module (FM) are introduced to enhance the network’s ability to learn critical features. Experimental results demonstrate that our model achieves a classification accuracy of 82.6% on a manhole cover defect dataset, outperforming several state-of-the-art fine-grained image classification models.展开更多
To enhance the accuracy of deep learning methods based on reconstruction discrepancy in satellite anomaly detection tasks,this study proposes a dual-branch reconstruction model(DBRM)and designs a comprehensive satelli...To enhance the accuracy of deep learning methods based on reconstruction discrepancy in satellite anomaly detection tasks,this study proposes a dual-branch reconstruction model(DBRM)and designs a comprehensive satellite anomaly detection framework around this model.Firstly,we introduce the temporal-channel mixer(TC-Mixer)module,which mainly comprises a self-attention layer for capturing long-range temporal dependencies in telemetry data,and two types of feed-forward networks(FFN)for extract-ing complex patterns in the temporal and channel dimension of telemetry data.This design endows the TC-Mixer module with robust capabilities for extracting complicated dependencies in telemetry data.Secondly,with the TC-Mixer module as the main component,we designed the DBRM.This model utilizes a shared latent representation layer,allowing the regeneration branch and forecasting branch of the DBRM to share most of the feature extraction network architecture.This approach significantly en-hances the model’s regression accuracy while reducing computational complexity.Thirdly,using the DBRM as the core network model,we devised a comprehensive satellite anomaly detection framework.This includes an anomaly criterion that considers the reconstruction discrepancy of both the regeneration and forecasting branches,the peak-over-threshold(POT)method for anomaly thresholding,and the MIC-based feature engineering method,etc.Finally,we conducted comparative experiments with several SOTA anomaly detection algorithms on two public and one private satellite anomaly detection datasets.The experimental results validate the effectiveness and superiority of our proposed method.展开更多
针对遥感建筑物图像中建筑物大小不一、边缘模糊导致精度不高的问题,提出一种双分支并行融合注意力机制的网络模型TC-UNet++。针对卷积神经网络擅长提取局部特征,难以捕获全局信息的特点,引入Transformer结构以解决全局信息丢失的问题...针对遥感建筑物图像中建筑物大小不一、边缘模糊导致精度不高的问题,提出一种双分支并行融合注意力机制的网络模型TC-UNet++。针对卷积神经网络擅长提取局部特征,难以捕获全局信息的特点,引入Transformer结构以解决全局信息丢失的问题。对于两种结构的特征维度和通道数不匹配的问题,设计一种TC(Transformer to CNN)模块以交互的方式融合不同分辨率下局部与全局特征。引入坐标注意力机制,根据像素在图像中的位置信息,定位和识别建筑物。实验结果表明,TC-UNet++在WHU数据集上交互比、准确率、总精度分别达到了93.1%、95.9%、98.8%,在不显著增加参数的情况下,展现出良好的有效性。展开更多
Tactile sensing provides robots the ability of object recognition,fine operation,natural interaction,etc.However,in the actual scenario,robotic tactile recognition of similar objects still faces difficulties such as l...Tactile sensing provides robots the ability of object recognition,fine operation,natural interaction,etc.However,in the actual scenario,robotic tactile recognition of similar objects still faces difficulties such as low efficiency and accuracy,resulting from a lack of high-performance sensors and intelligent recognition algorithms.In this paper,a flexible sensor combining a pyramidal microstructure with a gradient conformal ionic gel coating was demonstrated,exhibiting excellent signal-to-noise ratio(48 dB),low detection limit(1 Pa),high sensitivity(92.96 kPa^(-1)),fast response time(55 ms),and outstanding stability over 15,000 compression-release cycles.Furthermore,a Pressure-Slip Dual-Branch Convolutional Neural Network(PSNet)architecture was proposed to separately extract hardness and texture features and perform feature fusion.In tactile experiments on different kinds of leaves,a recognition rate of 97.16%was achieved,and surpassed that of human hands recognition(72.5%).These researches showed the great potential in a broad application in bionic robots,intelligent prostheses,and precise human–computer interaction.展开更多
Extracting useful details from images is essential for the Internet of Things project.However,in real life,various external environments,such as badweather conditions,will cause the occlusion of key target information...Extracting useful details from images is essential for the Internet of Things project.However,in real life,various external environments,such as badweather conditions,will cause the occlusion of key target information and image distortion,resulting in difficulties and obstacles to the extraction of key information,affecting the judgment of the real situation in the process of the Internet of Things,and causing system decision-making errors and accidents.In this paper,we mainly solve the problem of rain on the image occlusion,remove the rain grain in the image,and get a clear image without rain.Therefore,the single image deraining algorithm is studied,and a dual-branch network structure based on the attention module and convolutional neural network(CNN)module is proposed to accomplish the task of rain removal.In order to complete the rain removal of a single image with high quality,we apply the spatial attention module,channel attention module and CNN module to the network structure,and build the network using the coder-decoder structure.In the experiment,with the structural similarity(SSIM)and the peak signal-to-noise ratio(PSNR)as evaluation indexes,the training and testing results on the rain removal dataset show that the proposed structure has a good effect on the single image deraining task.展开更多
In nature, the most possible reason that helix is chosen as the basic structure of life molecule is based on its simplest chiral three-dimensional structure. The process of the conversion from chemical molecules to do...In nature, the most possible reason that helix is chosen as the basic structure of life molecule is based on its simplest chiral three-dimensional structure. The process of the conversion from chemical molecules to double helical molecules is completed by the topology effect which belongs to the simplest way to form helix, and no external power is needed;moreover, the energy of double helix has fixed drive direction [1]. The dual-branch loop helix (II)—the transition state of double helix has many uses, for example, it can be turned to double helix, and it may be broken into two fragments of a and b which can construct more complicated structures. So the dual- branch loop helix (II) can provide special "building block" of assembling biomembrane and other life molecules.展开更多
Monocular 6D pose estimation is a functional task in the field of com-puter vision and robotics.In recent years,2D-3D correspondence-based methods have achieved improved performance in multiview and depth data-based s...Monocular 6D pose estimation is a functional task in the field of com-puter vision and robotics.In recent years,2D-3D correspondence-based methods have achieved improved performance in multiview and depth data-based scenes.However,for monocular 6D pose estimation,these methods are affected by the prediction results of the 2D-3D correspondences and the robustness of the per-spective-n-point(PnP)algorithm.There is still a difference in the distance from the expected estimation effect.To obtain a more effective feature representation result,edge enhancement is proposed to increase the shape information of the object by analyzing the influence of inaccurate 2D-3D matching on 6D pose regression and comparing the effectiveness of the intermediate representation.Furthermore,although the transformation matrix is composed of rotation and translation matrices from 3D model points to 2D pixel points,the two variables are essentially different and the same network cannot be used for both variables in the regression process.Therefore,to improve the effectiveness of the PnP algo-rithm,this paper designs a dual-branch PnP network to predict rotation and trans-lation information.Finally,the proposed method is verified on the public LM,LM-O and YCB-Video datasets.The ADD(S)values of the proposed method are 94.2 and 62.84 on the LM and LM-O datasets,respectively.The AUC of ADD(-S)value on YCB-Video is 81.1.These experimental results show that the performance of the proposed method is superior to that of similar methods.展开更多
Sputum smear tests are critical for the diagnosis of respiratory diseases. Automatic segmentation of bacteria from spu-tum smear images is important for improving diagnostic efficiency. However, this remains a challen...Sputum smear tests are critical for the diagnosis of respiratory diseases. Automatic segmentation of bacteria from spu-tum smear images is important for improving diagnostic efficiency. However, this remains a challenging task owing to the high interclass similarity among different categories of bacteria and the low contrast of the bacterial edges. To explore more levels of global pattern features to promote the distinguishing ability of bacterial categories and main-tain sufficient local fine-grained features to ensure accurate localization of ambiguous bacteria simultaneously, we propose a novel dual-branch deformable cross-attention fusion network (DB-DCAFN) for accurate bacterial segmen-tation. Specifically, we first designed a dual-branch encoder consisting of multiple convolution and transformer blocks in parallel to simultaneously extract multilevel local and global features. We then designed a sparse and deformable cross-attention module to capture the semantic dependencies between local and global features, which can bridge the semantic gap and fuse features effectively. Furthermore, we designed a feature assignment fusion module to enhance meaningful features using an adaptive feature weighting strategy to obtain more accurate segmentation. We conducted extensive experiments to evaluate the effectiveness of DB-DCAFN on a clinical dataset comprising three bacterial categories: Acinetobacter baumannii, Klebsiella pneumoniae, and Pseudomonas aeruginosa. The experi-mental results demonstrate that the proposed DB-DCAFN outperforms other state-of-the-art methods and is effective at segmenting bacteria from sputum smear images.展开更多
针对密集人群计数中人头尺度变化大、复杂背景干扰的问题,提出基于自注意力机制的双分支密集人群计数算法.该算法结合卷积神经网络(CNN)和Transformer 2种网络框架,通过多尺度CNN分支和基于卷积增强自注意力模块的Transformer分支,分别...针对密集人群计数中人头尺度变化大、复杂背景干扰的问题,提出基于自注意力机制的双分支密集人群计数算法.该算法结合卷积神经网络(CNN)和Transformer 2种网络框架,通过多尺度CNN分支和基于卷积增强自注意力模块的Transformer分支,分别获取人群局部信息和全局信息.设计双分支注意力融合模块,以具备连续尺度的人群特征提取能力;通过基于混合注意力模块的Transformer网络提取深度特征,进一步区分复杂背景并聚焦人群区域.采用位置级-全监督方式和计数级-弱监督方式,在ShanghaiTech Part A、ShanghaiTech Part B、UCFQNRF、JHU-Crowd++等数据集上进行实验.结果表明,算法在4个数据集上的性能均优于最近研究,全监督算法在上述数据集的平均绝对误差和均方根误差分别为55.3、6.7、82.9、55.7和93.1、9.8、145.1、248.0,可以实现高密集、高遮挡场景下的准确计数.特别是在弱监督算法对比中,以低参数量实现了更佳的计数精度,并达到全监督87.9%的计数效果.展开更多
文摘Manhole cover defect recognition is of significant practical importance as it can accurately identify damaged or missing covers, enabling timely replacement and maintenance. Traditional manhole cover detection techniques primarily focus on detecting the presence of covers rather than classifying the types of defects. However, manhole cover defects exhibit small inter-class feature differences and large intra-class feature variations, which makes their recognition challenging. To improve the classification of manhole cover defect types, we propose a Progressive Dual-Branch Feature Fusion Network (PDBFFN). The baseline backbone network adopts a multi-stage hierarchical architecture design using Res-Net50 as the visual feature extractor, from which both local and global information is obtained. Additionally, a Feature Enhancement Module (FEM) and a Fusion Module (FM) are introduced to enhance the network’s ability to learn critical features. Experimental results demonstrate that our model achieves a classification accuracy of 82.6% on a manhole cover defect dataset, outperforming several state-of-the-art fine-grained image classification models.
基金supported by the Science Center Program of the National Natural Science Foundation of China(Grant No.62188101)SiYuan Col-laborative Innovation Alliance of Artificial Intelligence Science and Technol-ogy(Grant No.HTKJ2023SY502003)+1 种基金Heilongjiang Touyan Team,Guang-dong Major Project of Basic and Applied Basic Research(Grant No.2019B030302001)Shanghai Aerospace Science and Technology Inno-vation Foundation(Grant No.SAST2021-033).
文摘To enhance the accuracy of deep learning methods based on reconstruction discrepancy in satellite anomaly detection tasks,this study proposes a dual-branch reconstruction model(DBRM)and designs a comprehensive satellite anomaly detection framework around this model.Firstly,we introduce the temporal-channel mixer(TC-Mixer)module,which mainly comprises a self-attention layer for capturing long-range temporal dependencies in telemetry data,and two types of feed-forward networks(FFN)for extract-ing complex patterns in the temporal and channel dimension of telemetry data.This design endows the TC-Mixer module with robust capabilities for extracting complicated dependencies in telemetry data.Secondly,with the TC-Mixer module as the main component,we designed the DBRM.This model utilizes a shared latent representation layer,allowing the regeneration branch and forecasting branch of the DBRM to share most of the feature extraction network architecture.This approach significantly en-hances the model’s regression accuracy while reducing computational complexity.Thirdly,using the DBRM as the core network model,we devised a comprehensive satellite anomaly detection framework.This includes an anomaly criterion that considers the reconstruction discrepancy of both the regeneration and forecasting branches,the peak-over-threshold(POT)method for anomaly thresholding,and the MIC-based feature engineering method,etc.Finally,we conducted comparative experiments with several SOTA anomaly detection algorithms on two public and one private satellite anomaly detection datasets.The experimental results validate the effectiveness and superiority of our proposed method.
文摘针对遥感建筑物图像中建筑物大小不一、边缘模糊导致精度不高的问题,提出一种双分支并行融合注意力机制的网络模型TC-UNet++。针对卷积神经网络擅长提取局部特征,难以捕获全局信息的特点,引入Transformer结构以解决全局信息丢失的问题。对于两种结构的特征维度和通道数不匹配的问题,设计一种TC(Transformer to CNN)模块以交互的方式融合不同分辨率下局部与全局特征。引入坐标注意力机制,根据像素在图像中的位置信息,定位和识别建筑物。实验结果表明,TC-UNet++在WHU数据集上交互比、准确率、总精度分别达到了93.1%、95.9%、98.8%,在不显著增加参数的情况下,展现出良好的有效性。
基金supported by the Open Project of the State Key Laboratory of Trauma and Chemical Poisoning(SKL202102)the Key R&D and Transformation of Science and Technology Projects in Tibet Autonomous Region(XZ2022RH001)+3 种基金Chongqing Talents Program(CQYC2020030146)the Project of Chongqing Science and Technology Bureau(cstc2021ycjh-bgzxm0345)Chongqing Bayu Scholar Program(DP2020036)Chongqing Entrepreneurship and Innovation Support Program for Overseas Students Returning to China.
文摘Tactile sensing provides robots the ability of object recognition,fine operation,natural interaction,etc.However,in the actual scenario,robotic tactile recognition of similar objects still faces difficulties such as low efficiency and accuracy,resulting from a lack of high-performance sensors and intelligent recognition algorithms.In this paper,a flexible sensor combining a pyramidal microstructure with a gradient conformal ionic gel coating was demonstrated,exhibiting excellent signal-to-noise ratio(48 dB),low detection limit(1 Pa),high sensitivity(92.96 kPa^(-1)),fast response time(55 ms),and outstanding stability over 15,000 compression-release cycles.Furthermore,a Pressure-Slip Dual-Branch Convolutional Neural Network(PSNet)architecture was proposed to separately extract hardness and texture features and perform feature fusion.In tactile experiments on different kinds of leaves,a recognition rate of 97.16%was achieved,and surpassed that of human hands recognition(72.5%).These researches showed the great potential in a broad application in bionic robots,intelligent prostheses,and precise human–computer interaction.
基金supported by the NationalNatural Science Foundation of China(No.62001272).
文摘Extracting useful details from images is essential for the Internet of Things project.However,in real life,various external environments,such as badweather conditions,will cause the occlusion of key target information and image distortion,resulting in difficulties and obstacles to the extraction of key information,affecting the judgment of the real situation in the process of the Internet of Things,and causing system decision-making errors and accidents.In this paper,we mainly solve the problem of rain on the image occlusion,remove the rain grain in the image,and get a clear image without rain.Therefore,the single image deraining algorithm is studied,and a dual-branch network structure based on the attention module and convolutional neural network(CNN)module is proposed to accomplish the task of rain removal.In order to complete the rain removal of a single image with high quality,we apply the spatial attention module,channel attention module and CNN module to the network structure,and build the network using the coder-decoder structure.In the experiment,with the structural similarity(SSIM)and the peak signal-to-noise ratio(PSNR)as evaluation indexes,the training and testing results on the rain removal dataset show that the proposed structure has a good effect on the single image deraining task.
文摘In nature, the most possible reason that helix is chosen as the basic structure of life molecule is based on its simplest chiral three-dimensional structure. The process of the conversion from chemical molecules to double helical molecules is completed by the topology effect which belongs to the simplest way to form helix, and no external power is needed;moreover, the energy of double helix has fixed drive direction [1]. The dual-branch loop helix (II)—the transition state of double helix has many uses, for example, it can be turned to double helix, and it may be broken into two fragments of a and b which can construct more complicated structures. So the dual- branch loop helix (II) can provide special "building block" of assembling biomembrane and other life molecules.
基金This work was supported by the National Natural Science Foundation of China(No.61871196 and 62001176)the Natural Science Foundation of Fujian Province of China(No.2019J01082 and 2020J01085)the Promotion Program for Young and Middle-aged Teachers in Science and Technology Research of Huaqiao University(ZQN-YX601).
文摘Monocular 6D pose estimation is a functional task in the field of com-puter vision and robotics.In recent years,2D-3D correspondence-based methods have achieved improved performance in multiview and depth data-based scenes.However,for monocular 6D pose estimation,these methods are affected by the prediction results of the 2D-3D correspondences and the robustness of the per-spective-n-point(PnP)algorithm.There is still a difference in the distance from the expected estimation effect.To obtain a more effective feature representation result,edge enhancement is proposed to increase the shape information of the object by analyzing the influence of inaccurate 2D-3D matching on 6D pose regression and comparing the effectiveness of the intermediate representation.Furthermore,although the transformation matrix is composed of rotation and translation matrices from 3D model points to 2D pixel points,the two variables are essentially different and the same network cannot be used for both variables in the regression process.Therefore,to improve the effectiveness of the PnP algo-rithm,this paper designs a dual-branch PnP network to predict rotation and trans-lation information.Finally,the proposed method is verified on the public LM,LM-O and YCB-Video datasets.The ADD(S)values of the proposed method are 94.2 and 62.84 on the LM and LM-O datasets,respectively.The AUC of ADD(-S)value on YCB-Video is 81.1.These experimental results show that the performance of the proposed method is superior to that of similar methods.
基金the Natural Science Foundation of Shandong Province,No.ZR2021MH213and in part by the Suzhou Science and Technology Bureau,No.SJC2021023.
文摘Sputum smear tests are critical for the diagnosis of respiratory diseases. Automatic segmentation of bacteria from spu-tum smear images is important for improving diagnostic efficiency. However, this remains a challenging task owing to the high interclass similarity among different categories of bacteria and the low contrast of the bacterial edges. To explore more levels of global pattern features to promote the distinguishing ability of bacterial categories and main-tain sufficient local fine-grained features to ensure accurate localization of ambiguous bacteria simultaneously, we propose a novel dual-branch deformable cross-attention fusion network (DB-DCAFN) for accurate bacterial segmen-tation. Specifically, we first designed a dual-branch encoder consisting of multiple convolution and transformer blocks in parallel to simultaneously extract multilevel local and global features. We then designed a sparse and deformable cross-attention module to capture the semantic dependencies between local and global features, which can bridge the semantic gap and fuse features effectively. Furthermore, we designed a feature assignment fusion module to enhance meaningful features using an adaptive feature weighting strategy to obtain more accurate segmentation. We conducted extensive experiments to evaluate the effectiveness of DB-DCAFN on a clinical dataset comprising three bacterial categories: Acinetobacter baumannii, Klebsiella pneumoniae, and Pseudomonas aeruginosa. The experi-mental results demonstrate that the proposed DB-DCAFN outperforms other state-of-the-art methods and is effective at segmenting bacteria from sputum smear images.
文摘针对密集人群计数中人头尺度变化大、复杂背景干扰的问题,提出基于自注意力机制的双分支密集人群计数算法.该算法结合卷积神经网络(CNN)和Transformer 2种网络框架,通过多尺度CNN分支和基于卷积增强自注意力模块的Transformer分支,分别获取人群局部信息和全局信息.设计双分支注意力融合模块,以具备连续尺度的人群特征提取能力;通过基于混合注意力模块的Transformer网络提取深度特征,进一步区分复杂背景并聚焦人群区域.采用位置级-全监督方式和计数级-弱监督方式,在ShanghaiTech Part A、ShanghaiTech Part B、UCFQNRF、JHU-Crowd++等数据集上进行实验.结果表明,算法在4个数据集上的性能均优于最近研究,全监督算法在上述数据集的平均绝对误差和均方根误差分别为55.3、6.7、82.9、55.7和93.1、9.8、145.1、248.0,可以实现高密集、高遮挡场景下的准确计数.特别是在弱监督算法对比中,以低参数量实现了更佳的计数精度,并达到全监督87.9%的计数效果.