期刊文献+

结合两种距离测度的说话人聚类算法 被引量:1

Speaker Clustering Algorithm Based on Two Distance Metrics
下载PDF
导出
摘要 说话人聚类研究如何将一段语音中同一说话人的语音聚合.提出一种基于结合广义似然比与归一化交叉似然比两种距离测度的聚类算法.算法首先提取每一段语音信号的MFCC特征,并建立高斯混合模型,最后采用基于结合广义似然比与归一化交叉似然比两种距离测度的层次化策略对语音信号进行聚类.在算法中,贝叶斯判据用以确定聚类结束的条件.实验表明,该算法提高了系统的综合性能,较好的解决了无监督说话人聚类问题.结合两种距离测度比单独使用任何一种距离测度的系统性能提高了6%.并且,通过改进更新类间距的方式,聚类速度相比传统高斯混合模型聚类方法提升6倍. Speaker clustering addresses the problem of grouping a set of speech utterances based on the identity of the speaker of the utterances. In this paper we proposed a novel clustering algorithm based on two distance metrics combining Generalized Likelihood Ratio and Normalized Cross Likelihood Ratio. In our proposal, Mel Frequency Cepstrum Coefficientsare first extracted from speech sampies and modeled by Gaussian Mixture Models to represent the speech. Following a hierarchical clustering scheme is built combining GLR and NCLR metrics. In addition, Bayes Information Criteriais employed as the termination criterion. Experimental results show the cluster performance of combining GLR and NCLR is improved compared with either of them. As well, the efficiency is also improved greatly compared with the traditional GMM cluster method.
出处 《小型微型计算机系统》 CSCD 北大核心 2015年第10期2369-2373,共5页 Journal of Chinese Computer Systems
基金 国家"八六三"高技术研究发展计划项目(2014AA015104)资助
关键词 说话人聚类 广义似然比 归一化交叉似然比 贝叶斯判据 speaker clustering Generalized Likelihood Ratio ( GLR) Normalized Cross Likelihood Ratio ( NCLR ) Bayes Information Criteria( BIC)
  • 相关文献

参考文献16

  • 1Gupta V, Boulianne G, Kenny P, et al. Speaker diarization of French broadcast news [ C~. Proceedings of 1EEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, 2008 : 4365- 4368. 被引量:1
  • 2Xavier Anguera, Simon Bozonnet, Nicholas Evans, et al. Speaker di- arization : a review of recent research~ J]. IEEE Transactions on Au- dio, Speech, and Language Processing,2012,20 ( 2 ) :356-370. 被引量:1
  • 3Mathieu Ben, Michael Betser, Frederic Bimbot, et al. Speaker dia- rization using bottom-up clustering based on parameter-derived dis- tance between adapted GMMs E C ]. Proceedings of the International Conference on Spoken and Language Processing, Kobe, Japan, 2004. 被引量:1
  • 4Zhou Yu, Jin Yi-zhu, Li Gui-lian. Speaker diarization system based on HMM-BIC E J 1. Journal of Tsinghua University ( Science and Technology ), 2011,51 ( 9 ) : 1267 -1270. 被引量:1
  • 5Qin Jin, Kornel Laskowski, Tanja Schultz, et al. Speaker segmenta- tion and clustering in meetings E C ]. Proceedings of NIST 2004 Spring Richtranscription Evaluation Workshop, Montreal, Canada, 2004. 被引量:1
  • 6David Wang, Robert Vogt, Sridha Sridharan, et al. Cross likelihood ratio based speaker clustering using eigenvoice models [ C 1. Pro- ceedings of 12th Annual Conference of the International Speech Communication Association, Florence, Italy ,2011. 被引量:1
  • 7David Wang, Robert Vogt, Sridha Sridharan. Eigenvoice modeling for cross likelihood ratio based speaker clustering: a Bayesian ap- proach [ J]. Computer Speech and Language, 2013,27 ( 4 ) : 1011- 1027. 被引量:1
  • 8蒋晔,唐振民.GMM文本无关的说话人识别系统研究[J].计算机工程与应用,2010,46(11):179-182. 被引量:27
  • 9Zhou-Xi, Dai Bei-qian, Chen Yan-xiang, et al. Unsupervised speaker clustering based on purity and BBN algorithm[ J]. Pattern Recogni- tion and Artificial Intelligence ,2006,18 (4) :486-490. 被引量:1
  • 10Robert B. Dunn, Douglas A. Reynolds, Thomas F. Quaffed. Ap- proaches to speaker detection and tracking in conversational speech [ J ]. Digital Signal Processing ,2000,10 ( 1-3 ) :93-112. 被引量:1

二级参考文献27

  • 1吴尊敬,曹志刚.Improved MFCC-Based Feature for Robust Speaker Identification[J].Tsinghua Science and Technology,2005,10(2):158-161. 被引量:7
  • 2Reynolds D A,Rose R C.Robust text-independent speaker identification using Gaussian mixture speaker models[J].IEEE Transactions on Speech and Audio Processing,1995,3(1):72-83. 被引量:1
  • 3Reynolds D A.Speaker identification and verification using Gaussian mixture speaker model[J].Speech Communication,1995,17:91-108. 被引量:1
  • 4You K H.Wang H C.Joint estimation of feature transformation parameters and Gaussian mixture model for speaker identification[J].Speech Communication,1999,28:227-241. 被引量:1
  • 5Jim Z C.Improvement of the K-means clustering filtering algorithm[J].Pattern Recognition,2008,41 (12):3677-3681. 被引量:1
  • 6Reynolds D A,Thomas F.Speaker verification using adapted Gaus-sian mixture models[J].Digital Signal Processing,2000,10 (1-3):19-41. 被引量:1
  • 7Barras C, Zhu X, Meignier S, et al. Multi stage speaker diarizalion of broadcast news[J].IEEE Transactions on Audio, Speech and Language Processing, 2006, 14(5): 1505 - 1512. 被引量:1
  • 8Deleglise P, Esteve Y, Meignier S, et al. Improvements to the LIUM French ASR system based on CMU Sphinx: what helps to significantly reduce the word error rate? [C]// Interspeech. Brighton, NJ:ISCA, 2009:2123-2126. 被引量:1
  • 9Pardo J L, Anguera X, Wooters X, Speaker diarization for multiple distant microphone meetings using several sources of information [J].IEEE Transactions on Computers, 2007, 56(9) : 1214 - 1224. 被引量:1
  • 10Nguyen H T, Chng E, Li H Z. T-test distance and clustering criterion for speaker diarization [C]//Interspeech. Brisbane, NI, ISCA, 2008, 36-39. 被引量:1

共引文献30

同被引文献3

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部