期刊文献+

基于深度学习的开放场景下声纹识别系统的设计与实现 被引量:4

A deep learning-based speaker recognition system for open set scenarios
下载PDF
导出
摘要 针对现实应用场景中短时语音和混叠有噪声情况下声纹识别准确性低的问题,本文设计了一种改进的基于深度学习的声纹识别算法,提高了声纹识别模型在短时语音和带噪环境下的鲁棒性,并将该模型部署到了嵌入式设备中.本文主要对声纹识别算法的编码层和损失函数进行改进.对于编码层,本文使用了基于差分编码的NeXtVLAD技术,同时对帧级特征中的静态声纹特征和动态声纹特征进行建模.对于损失函数,本文将基于小样本学习框架的余弦-原型损失函数cosine-Prototypical与附加间隔分类损失函数AM-Softmax进行融合来训练声纹识别模型,使得模型在特征空间中的同类特征尽可能集聚,异类特征尽可能分离.此外,本文还将声纹识别算法部署在Raspberry Pi平台上,实现了能快速推理的声纹识别系统.实验结果表明:这种改进的声纹识别系统在多种开放场景下,能够实时、准确地完成声纹识别任务,可以达到实际应用的要求. Due to the low accuracy of speaker recognition for short-term speech or under overlapping noisy situations,a new speaker recognition algorithm based on deep learning is proposed and then deployed on an embedded device.The encoding layer and loss function are the two aspects to improve the speaker recognition system in robustness.For the encoding layer,the NeXtVLAD technique based on differential encoding is used to model both static and dynamic speaker features at frame level.For the loss function,the cosine-prototypical loss function based on small-sample learning framework is fused with the additional margin classification loss function AM-Softmax to train the speaker recognition model,which enables the model to collect similar features and separate dissimilar features as much as possible in the feature space.Then the improved speaker recognition algorithm is deployed on the Raspberry Pi platform to realize speaker recognition with fast inference.The experimental results illustrate that the system can accomplish speaker recognition in real time and accurately under various open set scenarios,and meet the requirements of practical applications.
作者 郭新 罗程方 邓爱文 GUO Xin;LUO Chengfang;DENG Aiwen(School of Mechanical and Electrical Engineering,Guangdong Communication Polytechnic,Guangzhou 510520;School of Automation Science and Engineering,South China University of Technology,Guangzhou 510641)
出处 《南京信息工程大学学报(自然科学版)》 CAS 北大核心 2021年第5期526-532,共7页 Journal of Nanjing University of Information Science & Technology(Natural Science Edition)
基金 广东省青年创新人才项目(2018GkQNCX005)。
关键词 深度学习 开放场景 短时语音 声纹识别 差分化编码 NeXtVLAD 树莓派 deep learning open set short-term speech speaker recognition differential encoding NeXtVLAD Raspberry Pi(RPi)
  • 相关文献

参考文献1

共引文献3

同被引文献17

引证文献4

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部