期刊文献+

视频中多特征融合人体姿态跟踪 被引量:6

Human pose tracking based on multi-feature fusion in videos
原文传递
导出
摘要 目的目前已有的人体姿态跟踪算法的跟踪精度仍有待提高,特别是对灵活运动的手臂部位的跟踪。为提高人体姿态的跟踪精度,本文首次提出一种将视觉时空信息与深度学习网络相结合的人体姿态跟踪方法。方法在人体姿态跟踪过程中,利用视频时间信息计算出人体目标区域的运动信息,使用运动信息对人体部位姿态模型在帧间传递;考虑到基于图像空间特征的方法对形态较为固定的人体部位如躯干和头部能够较好地检测,而对手臂的检测效果较差,构造并训练一种轻量级的深度学习网络,用于生成人体手臂部位的附加候选样本;利用深度学习网络生成手臂特征一致性概率图,与视频空间信息结合计算得到最优部位姿态,并将各部位重组为完整人体姿态跟踪结果。结果使用两个具有挑战性的人体姿态跟踪数据集Video Pose2.0和You Tube Pose对本文算法进行验证,得到的手臂关节点平均跟踪精度分别为81.4%和84.5%,与现有方法相比有明显提高;此外,通过在VideoPose2.0数据集上的实验,验证了本文提出的对下臂附加采样的算法和手臂特征一致性计算的算法能够有效提高人体姿态关节点的跟踪精度。结论提出的结合时空信息与深度学习网络的人体姿态跟踪方法能够有效提高人体姿态跟踪的精度,特别是对灵活运动的人体姿态下臂关节点的跟踪精度有显著提高。 Objective Human pose tracking in video sequences aims to estimate the pose of a certain person in each frame using image and video cues and consecutively track the human pose throughout the entire video.This field has been increasingly investigated because the development of artificial intelligence and the Internet of Things makes human-computer interaction frequent.Robots or intelligent agents would understand human action and intention by visually tracking human poses.At present,researchers frequently use pictorial structure model to express human poses and use inference methods for tracking.However,the tracking accuracy of current human pose tracking methods needs to be improved,especially for flexible moving arm parts.Although different types of features describe different types of information,the crucial point of human pose tracking depends on utilizing and combining appropriate features.We investigate the construction of effective features to accurately describe the poses of different body parts and propose a method that combines video spatial and temporal features and deep learning features to improve the accuracy of human pose tracking.This paper presents a novel human pose tracking method that effectively uses various video information to optimize human pose tracking in video sequences.Method An evaluation criterion should be used to track a visual target.Human pose is an articulated complex visual target,and evaluating it as a whole leads to ambiguity.In this case,this paper proposes a decomposable human pose expression model that can track each part of human body separately during the video and recombine parts into an entire body pose in each single image.Human pose is expressed as a principal component analysis model of trained contour shape similar to a puppet,and each human part pose contour can be calculated using key points and model parameters.As human pose unpredictably changes,tracking while detecting would improve the human pose tracking accuracy,which is different from traditional visual trac
作者 马淼 李贻斌 武宪青 高金凤 潘海鹏 Ma Miao;Li Yibin;Wu Xianqing;Gao Jinfeng;Pan Haipeng(Zhejiang Sci-Tech University,Hangzhou 310018,China;Shandong University,Jinan 250100,China)
出处 《中国图象图形学报》 CSCD 北大核心 2020年第7期1459-1472,共14页 Journal of Image and Graphics
基金 国家自然科学基金项目(61803339,61673245) 浙江省自然科学基金项目(LQ19F030014,LQ18F030011) 浙江理工大学青年创新专项(2019Q035)。
关键词 人体姿态跟踪 视觉目标跟踪 人机交互 深度学习网络 关节点概率图 human pose tracking visual target tracking human-computer interaction deep learning network probability map for joints
  • 相关文献

参考文献1

二级参考文献133

  • 1Furukawa Y, Ponce J. Accurate, dense, and robust multiview stereopsis[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2010, 32(8):1362-1376. 被引量:1
  • 2Agarwal S, Snavely N, Simon I, et al. Building rome on a day[C]//Proceedings of 12th International Conference on Computer Vision. Kyoto: IEEE, 2009: 368-381. 被引量:1
  • 3Zhang G F, Jia J Y, Wong T T, et al. Consistent depth maps recovery from a video sequence[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2009, 31(6):974-988. 被引量:1
  • 4Nister D, Stewenius H. Scalable recognition with a vocabulary tree[C]//Proceedings of Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2006:2161-2168. 被引量:1
  • 5Wang J, Yang J, Yu K, et al. Locality constrained linear coding for image classification[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, CA: IEEE, 2010: 3360-3367. 被引量:1
  • 6Zhang J G, Marszalek M, Lazebnik S, et al. Local features and kernels for classification of texture and object categories: a comprehensive study [J]. International Journal of Computer Vision, 2007, 73(2):213-238. 被引量:1
  • 7Lazebnik S, Schmid C, Ponce J. A sparse texture representation using local affine regions [J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2005, 27(8):1265-1278. 被引量:1
  • 8Lowe D G. Distinctive image features from scale-invariant keypoints [J]. International Journal of Computer Vision, 2004, 60(2): 91-110. 被引量:1
  • 9Jegou H, Douze M, Schmid C. Improving bag-of-features for large scale image search [J]. International Journal of Computer Vision, 2010, 87(3):316-336. 被引量:1
  • 10Jegou H, Douze M, Schmid C. Product quantization for nearest neighbor search [J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2011, 33(1):117-128. 被引量:1

共引文献17

同被引文献49

引证文献6

二级引证文献30

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部