摘要
机器人抓取检测一直是机器人领域的研究热点,但机器人在复杂环境下执行多物体抓取任务时面临位姿估计不准确的问题。为了解决这一问题,提出了一种基于Transformer的抓取检测模型——PTGNet(pyramid Transformer grasp network)。PTGNet采用具有金字塔池化结构和多头自注意力机制的Transformer模块,其中,金字塔池化结构能够对特征图进行分割和池化,以捕获不同层次的语义信息并降低计算复杂度,多头自注意力机制通过强大的特征提取能力有效地提取全局信息,使得PTGNet更适用于视觉抓取任务。为了验证PTGNet的性能,基于不同数据集对PTGNet进行训练和测试,并在仿真和真实物理环境下基于PTGNet开展机械臂抓取实验。结果表明,PTGNet在Cornell数据集和Jacquard数据集上的准确率分别为98.2%和94.8%,表现出具有竞争力的优异性能;在多目标数据集下,相比于其他检测模型,PTGNet具有优秀的泛化能力;在PyBullet仿真环境下开展的单对象和多对象抓取实验中,机械臂的平均抓取成功率分别达到了98.1%和96.8%;在真实物理环境下开展的多对象抓取实验中,机械臂的平均抓取成功率为93.3%。实验结果验证了PTGNet在复杂环境中预测多物体抓取位姿的有效性和优越性。
Robot grasping detection has always been a research focus in the field of robotics,but the robot faces the problem of inaccurate pose estimation when performing multi-object grasping tasks in complex environments.In order to improve this problem,a Transformer based grasping detection model called PTGNet(pyramid Transformer grasp network)was proposed.The PTGNet adopted Transformer modules with pyramid pooling structure and multi-head self-attention mechanism.The pyramid pooling structure could segment and pool feature maps to capture semantic information at different levels and reduce computational complexity,and the multi-head self-attention mechanism effectively extracted global information through powerful feature extraction capabilities,making PTGNet more suitable for visual grasping tasks.In order to verify the performance of the PTGNet,the training and testing for PTGNet were conducted based on different datasets,and the robot arm grasping experiments based on PTGNet were carried out in both simulated and real physical environments.The results showed that the accuracy of PTGNet on Cornell dataset and Jacquard dataset was 98.2%and 94.8%,respectively,showing excellent competitive performance.Compared with other detection models,the PTGNet had excellent generalization ability in multi-target datasets.In the single-object and multi-object grasping experiments conducted in the PyBullet simulation environment,the average grasping success rate of the robot arm reached 98.1%and 96.8%,respectively.In the multi-object grasping experiments conducted in the real physical environment,the average grasping success rate of the robot arm was 93.3%.The experimental results demonstrate the effectiveness and superiority of PTGNet in predicting multi-object grasping pose in complex environment.
作者
俞青松
徐向荣
刘胤真
YU Qingsong;XU Xiangrong;LIU Yinzhen(School of Mechanical Engineering,Anhui University of Technology,Maanshan 243032,China)
出处
《工程设计学报》
CSCD
北大核心
2024年第2期238-247,共10页
Chinese Journal of Engineering Design
基金
国家重点研发计划资助项目(2017YFE0113200)。