期刊文献+

基于Transformer的机器人像素级抓取位姿检测

Pixel-level grasping pose detection for robots based on Transformer
下载PDF
导出
摘要 机器人抓取检测一直是机器人领域的研究热点,但机器人在复杂环境下执行多物体抓取任务时面临位姿估计不准确的问题。为了解决这一问题,提出了一种基于Transformer的抓取检测模型——PTGNet(pyramid Transformer grasp network)。PTGNet采用具有金字塔池化结构和多头自注意力机制的Transformer模块,其中,金字塔池化结构能够对特征图进行分割和池化,以捕获不同层次的语义信息并降低计算复杂度,多头自注意力机制通过强大的特征提取能力有效地提取全局信息,使得PTGNet更适用于视觉抓取任务。为了验证PTGNet的性能,基于不同数据集对PTGNet进行训练和测试,并在仿真和真实物理环境下基于PTGNet开展机械臂抓取实验。结果表明,PTGNet在Cornell数据集和Jacquard数据集上的准确率分别为98.2%和94.8%,表现出具有竞争力的优异性能;在多目标数据集下,相比于其他检测模型,PTGNet具有优秀的泛化能力;在PyBullet仿真环境下开展的单对象和多对象抓取实验中,机械臂的平均抓取成功率分别达到了98.1%和96.8%;在真实物理环境下开展的多对象抓取实验中,机械臂的平均抓取成功率为93.3%。实验结果验证了PTGNet在复杂环境中预测多物体抓取位姿的有效性和优越性。 Robot grasping detection has always been a research focus in the field of robotics,but the robot faces the problem of inaccurate pose estimation when performing multi-object grasping tasks in complex environments.In order to improve this problem,a Transformer based grasping detection model called PTGNet(pyramid Transformer grasp network)was proposed.The PTGNet adopted Transformer modules with pyramid pooling structure and multi-head self-attention mechanism.The pyramid pooling structure could segment and pool feature maps to capture semantic information at different levels and reduce computational complexity,and the multi-head self-attention mechanism effectively extracted global information through powerful feature extraction capabilities,making PTGNet more suitable for visual grasping tasks.In order to verify the performance of the PTGNet,the training and testing for PTGNet were conducted based on different datasets,and the robot arm grasping experiments based on PTGNet were carried out in both simulated and real physical environments.The results showed that the accuracy of PTGNet on Cornell dataset and Jacquard dataset was 98.2%and 94.8%,respectively,showing excellent competitive performance.Compared with other detection models,the PTGNet had excellent generalization ability in multi-target datasets.In the single-object and multi-object grasping experiments conducted in the PyBullet simulation environment,the average grasping success rate of the robot arm reached 98.1%and 96.8%,respectively.In the multi-object grasping experiments conducted in the real physical environment,the average grasping success rate of the robot arm was 93.3%.The experimental results demonstrate the effectiveness and superiority of PTGNet in predicting multi-object grasping pose in complex environment.
作者 俞青松 徐向荣 刘胤真 YU Qingsong;XU Xiangrong;LIU Yinzhen(School of Mechanical Engineering,Anhui University of Technology,Maanshan 243032,China)
出处 《工程设计学报》 CSCD 北大核心 2024年第2期238-247,共10页 Chinese Journal of Engineering Design
基金 国家重点研发计划资助项目(2017YFE0113200)。
关键词 TRANSFORMER 金字塔池化 抓取检测 多头自注意力 Transformer pyramid pooling grasp detection multi-head self-attention
  • 相关文献

参考文献3

二级参考文献15

共引文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部