摘要
随着说话人识别技术的广泛应用,说话人规模不断增长,若采用传统的说话人辨别方式逐一比较,则计算量较大,难以实时响应,使说话人识别系统的性能与实用性大大降低。传统的K-L散度距离由于非对称性,并不是一种很好的聚类距离度量,聚类效果不佳。论文提出了一种基于Wasserstein distance聚类方法,相比于传统说话人识别方法,该方法的识别准确率提升了近4.7%,并且识别耗时仅为传统识别方法的25.5%,大大提升了说话人识别系统的性能与实用性。
With the wide application of speaker recognition technology,the scale of back-end speakers is growing.If the traditional speaker recognition methods are compared one by one,the amount of calculation is large and it is difficult to respond in real time,which greatly reduces the performance and practicability of the speaker recognition system.Therefore,this paper proposes a speaker recognition method based on model clustering.And because the traditional K-L divergence distance is not a good clustering distance measure because of its asymmetry,the clustering effect is poor.In this paper,a Wasserstein distance clustering method based on approximate model is proposed.Compared with the traditional speaker recognition method,the recognition accuracy of this method is improved by nearly 4.7%,and the recognition time is only 25.5%of the traditional recognition method,which greatly improves the performance and practicability of the speaker recognition system.
作者
陈秉沃
张二华
唐振民
CHEN Bingwo;ZHANG Erhua;TANG Zhenmin(School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094)
出处
《计算机与数字工程》
2023年第8期1745-1749,1831,共6页
Computer & Digital Engineering