基于深层置信网络的说话人信息提取方法被引量：5

Deep Belief Network Based Speaker Information Extraction Method

下载PDF

导出

摘要在基于全差异空间因子(i-Vector)的说话人确认系统中,需进一步从语音段的i-Vector表示中提取说话人相关的区分性信息,以提高系统性能.文中通过结合锚模型的思想,提出一种基于深层置信网络的建模方法.该方法通过对i-Vector中包含的复杂差异信息逐层进行分析、建模,以非线性变换的形式挖掘出其中的说话人相关信息.在NIST SRE 2008核心测试电话训练-电话测试数据库上,男声和女声的等错误率分别为4.96%和6.18%.进一步与基于线性判别分析的系统进行融合,能将等错误率降至4.74%和5.35%. In i-vector based speaker verification system, it is necessary to extract the discriminative speaker information from i-vectors to further improve the performance of the system. Combined with the anchor model, a deep belief network based speaker-related information extraction method is proposed in this paper. By analyzing and modeling the complex variabilities contained in i-vectors layer-by-layer, the speaker-related information can be extracted with non-linear transformation. The experimental results on the core test of NIST SRE 2008 show the superiority of the proposed method. Compared with the linear discriminant analysis based system, the equal error rates（EER） of male and female trials can be reduced to 4.96% and 6.18% respectively. Furthermore, after the fusion of the proposed method with linear discriminant analysis, the EER can be reduced to 4.74% and 5.35%.

作者陈丽萍王尔玉戴礼荣宋彦

机构地区中国科学技术大学电子工程与信息科学系腾讯控股有限公司

出处《模式识别与人工智能》 EI CSCD 北大核心 2013年第12期1089-1095,共7页 Pattern Recognition and Artificial Intelligence

基金国家自然科学基金项目(No.61273264) 国家973前期研究专项项目(No.2012CB326405)资助

关键词全差异空间因子说话人确认深层置信网络锚模型 i-Vector, Speaker Verification, Deep Belief Network, Anchor Model

分类号 TN912.34 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献16

1Reynolds D A, Quatieri T F, Dunn R. Speaker Verification Using Adapted Gaussian Mixture Model. Digital Signal Processing, 2000, 10(112/3): 19-41. 被引量：1
2Kenny P, Ouellet P, Dehak N, et al. A Study of Interspeaker Varia?bility in Speaker Verification. IEEE Trans on Audio, Speech and Language Processing, 2008, 16(5) : 980-988. 被引量：1
3Dehak N, Kenny P, Dehak R, et al. Front-End Factor Analysis for Speaker Verification. IEEE Trans on Audio, Speech and Language Processing, 2011, 19(4): 788-798. 被引量：1
4Fukunaga K. Introduction to Statistical Pattern Recognition. 2nd Edition. New York, USA: Academic Press, 1990. 被引量：1
5Hatch A 0, Stolcke A. Generalized Linear Kernels for One-versus?All Classification: Application to Speaker Recognition II Proc of the International Conference on Acoustics, Speech and Signal Proce?ssing. Toulouse, France, 2006: 585 -588. 被引量：1
6Mohammed A, Sainath T, Dahl G, et al. Deep Belief Networks Using Discriminative Features for Phone Recognition II Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Prague, Czech Republic, 2011: 5060-5063. 被引量：1
7Mohamed A, Dahl G E, Hinton G E. Acoustic Modeling Using Deep Belief Networks. IEEE Trans on Audio, Speech and Language Processing, 2012, 20 (1) : 14-22. 被引量：1
8Dahl G, Yu Dong, Deng Li, et al, Context-Dependent Pre-Trained Deep Neural Networks for Large Vocabulary Speech Recognition. IEEE Trans on Audio, Speech and Language Processing, 2012, 20 (1) : 30-42. 被引量：1
9Salakhutdinov R, Hinton G E. Learning Deep Generative Models. Ph. D Dissertation. Toronto, Canada: University of Toronto, 2011. 被引量：1
10Kenny P.Joint Factor Analysis of Speaker and Session Variability[EB/OLJ.[2012-1O-20J. http://www.crim.calperso/ patrick. kenny. 被引量：1

同被引文献42

1高慧,苏广川,陈善广.基于Teager能量算子(TEO)非线性特征的语音情绪识别[J].航天医学与医学工程,2005,18(6):427-431. 被引量：8
2刘庆华.基于声门闭合瞬间检测的时延算法研究[J].电声技术,2006,30(9):45-49. 被引量：1
3徐舜,陈绍荣,刘郁林.基于非线性时频掩蔽的语音盲分离方法[J].声学学报,2007,32(4):375-381. 被引量：9
4Bengio Y.Learning Deep Architectures for AI[J].Foundations and Trends in Machine Learning,2009,2(1):1-127. 被引量：1
5Dahl G E,Ranzato M,Mohamed A,et al.Phonerecognition with the Mean-covariance Restricted Boltzmann Machine[C]//Proceedings of the 24th Annual Conference on Neural Information Processing Systems.Berlin,Germany:Springer,2010:469-477. 被引量：1
6Mohamed A,Dahl G E,Hinton G,et al.Acoustic Modeling Using Deep Belief Networks[J].IEEE Transactions on Audio,Speech and Language Processing,2012,20(1):14-22. 被引量：1
7Salakhutdinov R,Hinton G.An Efficient Learning Procedure for Deep Boltzmann Machines[J].Neural Computation,2012,24(8):1967-2006. 被引量：1
8Hinton G E,Osindero S,Teh Y W.A Fast Learning Algorithm for Deep Belief Nets[J].Neural Computation,2006,18(7):1527-1554. 被引量：1
9Fischer A,Igel C.An Introduction to Restricted Boltzmann Machines[C]//Proceedings of Progress in Pattern Recognition,Image Analysis,Computer Vision,and Applications.Berlin,Germany:Springer,2012:14-36. 被引量：1
10Mohamed A,Dahl G,Hinton G.Deep Belief Networks for Phone Recognition[C]//Proceedings of Workshop on Deep Learning for Speech Recognition and Related Applications.Berlin,Germany:Springer,2009. 被引量：1

引证文献5

1赵彩光,张树群,雷兆宜.基于改进对比散度的GRBM语音识别[J].计算机工程,2015,41(5):213-218. 被引量：4
2酆勇,熊庆宇,石为人,曹俊华.一种基于受限玻尔兹曼机的说话人特征提取算法[J].仪器仪表学报,2016,37(2):256-262. 被引量：19
3茅正冲,王俊俊.基于耳蜗倒谱系数和Teager能量算子相位融合的说话人识别系统[J].南京理工大学学报,2018,42(1):82-88. 被引量：4
4刘镇,吕超,范远超.基于深度学习的多声源并行化声纹辨别方法[J].江苏科技大学学报（自然科学版）,2018,32(1):106-111. 被引量：6
5曾春艳,马超峰,王志锋,孔祥斌.基于卷积神经网络的鲁棒性说话人识别方法[J].华中科技大学学报（自然科学版）,2020,48(6):39-44. 被引量：9

二级引证文献41

1刘冬兰,孔德秋,常英贤,刘新,马雷,王睿.基于受限玻尔兹曼机的电力信息系统多源日志综合特征提取[J].计算机系统应用,2020,29(11):210-217. 被引量：1
2黄光磊,李喆,许永鹏,钱勇,盛戈皞,江秀臣.基于改进深度信念网络的直流XLPE电缆局部放电模式识别[J].高电压技术,2020,46(1):327-334. 被引量：11
3贾海蓉,王栋,郭欣.基于DNN的子空间语音增强算法[J].太原理工大学学报,2016,47(5):647-650. 被引量：1
4王媛媛,周涛,吴翠颖.深度学习及其在医学图像分析中的应用研究[J].电视技术,2016,40(10):118-126. 被引量：15
5黄玉蕾,罗晓霞,刘笃仁.MFSC系数特征局部有限权重共享CNN语音识别[J].控制工程,2017,24(7):1507-1513. 被引量：9
6代杰杰,宋辉,杨祎,陈玉峰,盛戈皞,江秀臣.基于深度信念网络的变压器油中溶解气体浓度预测方法[J].电网技术,2017,41(8):2737-2742. 被引量：44
7吴礼福,申浩.掩蔽法减少谱减法去混响中的音乐噪声[J].电子测量与仪器学报,2017,31(11):1855-1859. 被引量：4
8代杰杰,宋辉,杨祎,陈玉峰,盛戈皞,江秀臣.基于油中气体分析的变压器故障诊断ReLU-DBN方法[J].电网技术,2018,42(2):658-664. 被引量：56
9李浩,鲍鸿,张晶.基于深度神经网络的说话人识别模型研究[J].电脑与信息技术,2018,26(5):1-3. 被引量：3
10姚腾辉,李峰.基于深度信念网络的建筑物用水流量预测[J].软件导刊,2018,17(10):36-40. 被引量：4

1声卡将“男声”变“女声”[J].电脑爱好者（普及版）,2010(A02):227-227.
2小峰.男“声”女“声”变[J].软件指南,2004(12):77-77.
3步天宇.人声低音炮《再低音一次》[J].电脑迷,2008,0(19):28-28.
4李学昌.眼见尚不实耳听当然为虚[J].电脑爱好者,2008,0(8):84-84.
5路飞.打造男声版Super Star[J].电脑爱好者,2004(16):34-34.
6神奇变声器高档版:echoXP[J].电脑爱好者,2008,0(18):89-89.
7还猪比笨.小滔谈技巧[J].软件指南,2007(6):46-47.
8何超,杨先麟.小波分析在语音去噪中的应用[J].国外电子测量技术,2007,26(8):66-68. 被引量：3
9李志华,李超.一种控制继电器可靠性试验平台的研究与设计[J].计算机工程与设计,2005,26(11):3112-3114. 被引量：3
10陈放,潘素珍.计算机启动过程的解析[J].电子制作,2013,21(7X):86-86.

模式识别与人工智能

2013年第12期

浏览历史

内容加载中请稍等...

基于深层置信网络的说话人信息提取方法被引量：5

参考文献16

同被引文献42

引证文献5

二级引证文献41

相关作者

相关机构

相关主题

浏览历史

基于深层置信网络的说话人信息提取方法 被引量：5

参考文献16

同被引文献42

引证文献5

二级引证文献41

相关作者

相关机构

相关主题

浏览历史

基于深层置信网络的说话人信息提取方法被引量：5