面向连续手语识别的自适应关键帧选择

Adaptive keyframe selection for continuous sign language recognition

导出

摘要基于视觉的连续手语识别旨在从图像序列中识别出对应的手语词序列,可以为手语使用者提供一种便利的辅助工具.现有的连续手语识别方法大多需要从图像序列中,逐帧提取视觉和时序特征,而相邻帧中存在的相似视觉信息带来了大量的冗余计算.本文通过分析帧率对连续手语识别算法的影响,发现降低帧率可以显著地提升计算效率,但也会带来一定的性能损失.为了在降低帧率的同时保留更多手语关键信息,本文提出了自适应动态池化层(adaptive dynamic temporal pooling,ADTP),ADTP基于序列特征的自相似性对序列进行动态下采样.在此基础上,本文进一步提出了一种两阶段的训练方式,以更充分地利用原始帧率中的时空信息.具体而言,该训练方式在第一阶段只训练基于原始帧率的手语识别模型,并以此模型为教师网络,通过知识蒸馏的方式引导第二阶段含ADTP模块的模型训练.实验结果表明,本文所提的方法在损失少量性能的情况下,可以大幅度减少识别所需的计算量.此外,本文所提出的ADTP也可用于手语视频结构分析,生成简略直观的手语视频摘要. Vision-based continuous sign language recognition(CSLR),which aims to recognize unsegmented signs from image sequences,provides a convenient communication tool for sign language users.Recent CSLR approaches often extract visual and contextual features frame by frame from image sequences,leading to redundant computations due to the presence of similar visual information in adjacent frames.This paper analyzes the impact of framerate on continuous sign language recognition algorithms and finds that reducing the framerate significantly improves computational efficiency but may also result in performance degradation.To preserve more key sign language information while reducing computational cost,this paper proposes an adaptive dynamic temporal pooling(ADTP)layer that dynamically downsamples sequences based on their self-similarity in sequence features.Furthermore,a two-stage training scheme is introduced to better utilize the spatiotemporal information in original sequences.Specifically,in the first stage,the CSLR model is trained based on original sequences,and in the second stage,the model with the ADTP module is trained with knowledge distillation guided by the teacher network from the first stage.Experimental results demonstrate that the proposed method significantly reduces the computational requirements for recognition while only sacrificing a small amount of performance.Additionally,the proposed ADTP can also be applied to sign language video structure analysis,generating concise and intuitive summaries of sign language videos.

作者闵越聪陈熙霖 Yuecong MIN;Xilin CHEN(Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China;School of Computing Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区中国科学院计算技术研究所智能信息处理重点实验室中国科学院大学计算机科学与技术学院

出处《中国科学：信息科学》 CSCD 北大核心 2024年第4期893-910,共18页 Scientia Sinica(Informationis)

基金新一代人工智能国家科技重大专项(批准号:2021ZD0111900)资助项目。

关键词连续手语识别时间序列分析视觉语言知识蒸馏计算效率 continuous sign language recognition time series analysis visual languages knowledge distillation computational efficiency

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1曾浩.基于大数据分析的计算机硬件故障监控与预测[J].信息记录材料,2024,25(4):174-176.
2第二届厦门大学科技期刊论文短视频摘要大赛通知[J].电化学（中英文）,2024,30(4).
3王守芬.实景三维中国背景下的地理信息时空云平台建设研究[J].测绘与空间地理信息,2024,47(5):34-36. 被引量：1
4陈玉莹,王振辉,王照锋.海南基准站稳定分析数据处理方法及时间序列分析[J].测绘与空间地理信息,2024,47(5):90-92.
5王亚如,王艺璇.数字化教育对于高校心理健康教育的改进策略研究[J].中文科技期刊数据库（文摘版）教育,2024(4):0045-0048.
6许凯迪,姜雪松,杨立发.基于神经网络模型的医疗器械库存管理优化[J].中国新技术新产品,2024(8):139-141.
7吉小恒,卢雪莹.基于随机矩阵的电力计量装置运行状态异常监测[J].机械与电子,2024,42(5):41-45. 被引量：1
8周秀彩,丁国莹,张朝利.港口工程混凝土材料与耐久性研究[J].精细化工中间体,2024,54(2):63-68.
9孙吉书,彭俊博,张民,郭艳芳.废油/树脂/己二酸二辛酯复合再生剂制备及温拌剂对沥青拌合老化保护效果评价[J].中国塑料,2024,38(5):100-106.
10时雨,武迪,依里帕·依力哈木,郑彦玲,张利萍.乌鲁木齐市呼吸系统疾病空气质量健康指数的构建[J].环境与职业医学,2024,41(3):276-281.

中国科学：信息科学

2024年第4期

浏览历史

内容加载中请稍等...

面向连续手语识别的自适应关键帧选择

相关作者

相关机构

相关主题

浏览历史