期刊文献+

人与无人机集群多模态智能交互方法 被引量:4

Human-UAV swarm multi-modal intelligent interaction methods
原文传递
导出
摘要 针对人与无人机集群交互式协同感知问题,借助深度学习技术,构建了基于语音和手势双模型自主识别集群编队协同控制的交互框架,并提出了一种基于双通道切换的通道融合机制,从而实现多模态交互。使用百度云平台基于流式多级截断注意力(SMLTA)的语音识别模型,采用深度学习平台进行了自训练,在应用场景下的准确率由80.10%提升至97.98%。结合Kinect V2的深度信息与骨骼信息,构建与训练了基于特征融合的卷积神经网络(CNN)手势识别模型,平均精确率为98.33%,相较于传统决策树模型提升了1.16%,相较于传统CNN模型提升了0.33%。最后,在机器人操作系统(ROS)-Gazebo训练场景下进行了仿真验证和实物验证。实验结果表明:提出的交互框架能有效控制无人机集群进行编队,语音通道、手势通道和通道切换的指令执行成功率均达90%以上,且具有较高的交互效率。 For the problem of human-UAV swarm interactive collaborative perception,an interactive framework for collaborative control of swarm formation based on dual-model autonomous recognition of speech and gesture is constructed with the idea of deep learning.A channel fusion mechanism based on dual channel switching is proposed to realize multimodal interaction.The speech recognition model based on Streaming Multi-Layer Truncated Attention(SMLTA)provided by the Baidu cloud platform is used,and the deep learning platform is applied for self-training.The accuracy rate increases from 80.10%to 97.98%.Combining the depth information and bone information of Kinect V2,a Convolutional Neural Network(CNN)gesture recognition model based on feature fusion is constructed and trained.The average precision of the model is 98.33%,which is 1.16%higher than that of the decision tree model,and 0.33%higher than that of the traditional CNN model.Simulation and physical verification are carried out in the Robot Operating System(ROS)-Gazebo training scenario.The results show that the proposed interactive framework can effectively control UAV swarm formation,and the command execution success rate of the voice channel,gesture channel and channel switching can reach more than 90%,and has a higher interaction efficiency.
作者 苏翎菲 化永朝 董希旺 任章 SU Lingfei;HUA Yongzhao;DONG Xiwang;REN Zhang(School of Automation Science and Electrical Engineering,Beihang University,Beijing 100191,China;Institute of Artificial Intelligence,Beihang University,Beijing 100191,China)
出处 《航空学报》 EI CAS CSCD 北大核心 2022年第S01期129-142,共14页 Acta Aeronautica et Astronautica Sinica
基金 国防科工局基础预研项目(JCKY2019601C106)
关键词 深度学习 人机交互 无人机集群 语音识别 手势识别 deep learning human-computer interaction UAV swarm speech recognition gesture recognition
  • 相关文献

参考文献12

二级参考文献37

  • 1刘晓明,覃胜,刘宗行,江泽佳.语音端点检测的仿真研究[J].系统仿真学报,2005,17(8):1974-1976. 被引量:21
  • 2Bengio Y, Ducharme R, Vincent P, et al. A neural probabilistic language model [J]. Journal of Machine Learning Research, 2003, 3(2): 1137-1155. 被引量:1
  • 3Mikolov T, Kopecky J, Burger L, et al. Neural network based language models for highly inflective languages [C] // Proc of the 34th IEEE Int Conf on Acoustics, Speech and Signal. Piseataway, NJ: IEEE, 2009:4725-4728. 被引量:1
  • 4Boulanger-Lewandowski N, Bengio Y, Vincent P. Modeling temporal dependencies in high-dimensional sequences Application to polyphonic music generation and transcription [C] //Proc of the 29th Int Conf on Machine Learning. New York: ACM, 2012:590-598. 被引量:1
  • 5Bottou L. Stochastic gradient learning in neural networks [C] //Proc of Neuro Nimes 91, Nimes: EC2, 1991: 687-699. 被引量:1
  • 6Bengio Y, Frasconi P, Simard P. The problem of learning long-term dependencies in recurrent networks [C] //Proc of IEEE Int Conf on Neural Networks. Piscataway, NJ: IEEE, 1993: 1183-1188. 被引量:1
  • 7Xu W, Rundieky A. Can artificial neural networks learn language models? [C] //Proc of the 6th Int Conf on Spoken I,anguage Processing. Beijing, China: ISCA, 2000. 被引量:1
  • 8Bengio Y, Simard P, Fraseoni P. Learning long-term dependencies with gradient descent is difficult [C]//Proe of IEEE Trans on Neural Networks. Piseataway, NJ: IEEE, 1994:157-166. 被引量:1
  • 9Mikolov T, Kombrink S, Deoras A, el al. RNNLM- Recurrent neural network language modeling toolkit [C] // Proc of IEEE Workshop on Automatic Speech Recognition and Understanding. Piscataway, NJ: IEEE, 2011: 5528- 5531. 被引量:1
  • 10何超,胡章芳,王艳.一种基于改进DTW算法的动态手势识别方法[J].数字通信,2013,40(3):21-25. 被引量:7

共引文献66

同被引文献52

引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部