语音库裁剪的一种不定长递阶聚类方法被引量：1

A Non-Uniform Clustering Synthesis Instances Pruning Approach for Corpus-Based TTS

下载PDF

导出

摘要大量使用不定长是大语料库语音合成质量的一个重要保证,而语音库裁剪方法通常会导致不定长的损失.针对这一关键性问题,该文构建了NuClustering-VPA算法:对不同粒度的不定长变体进行聚类,根据高阶聚类结果调整低阶变体的聚类,使得低阶聚类中心有所偏向.NuClustering-VPA算法保留了最重要的不定长,从而有效减小了裁剪对不定长的破坏.测听实验表明,利用NuClustering-VPA算法,即使在语音库裁减率为39.63%时,合成自然度下降较小,仍然保持在较高的水平.这一技术已被应用在科大讯飞公司的实际语音产品中. The employment of non-uniform does great help for Corpus-based TTS to synthesize high natural speech. But Tailoring TTS voice font, or pruning redundant synthesis instances, usually results in loss of non-uniform. In order to solve this problem, this paper proposes the algorithm named NuClustering-VPA. According to this algorithm, the high level non-uniforms containing same syllables are clustered to several centers, then the centers are projected to low level non-uniforms. Therefore, the centerrs projections can guide the clustering of low level nonuniforms. These series of processes avoid erasing or destroying those key non-uniforms for synthesis. In experiments, the naturalness scored by MOS does not severely degrade when reduction rate is above 39.63%. And this approach has been applied in software products of Ifytek Co. Ltd.

作者张巍吴晓如刘江王仁华

机构地区中国海洋大学计算机科学系安徽中科大讯飞信息科技有限公司中国科学技术大学电子工程与信息科学系

出处《计算机学报》 EI CSCD 北大核心 2007年第11期2017-2024,共8页 Chinese Journal of Computers

基金国家自然科学基金(60602017) 国家"八六三"高技术研究发展计划项目基金(2004AA114030)资助.~~

关键词基于语料库的语音合成语音库裁剪语音库去冗余可伸缩语音合成系统 Corpus-based TTS Tailoring TTS voice font pruning redundant synthesis instances scalable TTS

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献22

1Hunt A,Black A.Unit selection in a concatenative speech synthesis system using a large speech database//Proceedings of the ICASSP1996.1996,1:373-376 被引量：1
2Sagisaka Y,Kaiki N,Iwahashi N,Mimura K.ATR-v-TALK speech synthesis system//Proceedings of the ICSLP 1992.1992,1:483-486 被引量：1
3刘庆峰..基于听感量化的语音合成研究[D].中国科学技术大学,2003:
4Chu M,Peng H,Yang H,Chang E.Selection non-uniform units from a very large corpus for concatenative speech synthesizer//Proceedings of the ICASSP2001.2001 被引量：1
5Rabiner L R.A tutorial on hidden markov models and selected application in speech recognition.IEEE Proceedings,1989,77(2):257-285 被引量：1
6Breiman L,Friedman J,Olsen R,Stone C.Classification and Regression Trees.Pacific Grove,CA:Wadsworth & Brooks,1984 被引量：1
7Black A W,Taylor P A.Automatically clustering similar units for units selection in speech synthesis//Proceedings of the Eurospeech1997.1997,2:601-604 被引量：1
8Hon H,Acero A,Huang X,Liu J,Plumpe M.Automatic generation of synthesis units for trainable text-to-speech systems//Proceedings of the ICASSP1998.1998,1:293-296 被引量：1
9Kim S H,Lee Y L,Hirose K.Pruning of redundant synthesis instances based on weight vector quantization//Proceedings of the Eurospeech2001.2001:2231-2234 被引量：1
10Kim S H,Lee Y L,Hirose K.Unit generation based on phrase break strength and pruing for corpus-based text-tospeech.ETRI Journal,2001,23(4):168-176 被引量：1

同被引文献44

1孔江平.藏语（拉萨话）声调感知研究[J].民族语文,1995(3):56-64. 被引量：42
2张巍,吴晓如,赵志伟,王仁华.基于虚拟不定长的语音库裁剪方法[J].软件学报,2006,17(5):983-990. 被引量：2
3冯哲,孙吉贵,张长胜,王岩.汉语语音合成的研究进展[J].吉林大学学报（信息科学版）,2007,25(2):198-206. 被引量：7
4Rouse M. Speech synthesis definition, http://whatis.techtarget.com/definition/speech-synthesis. 被引量：1
5Parlikar A, Black AW. Data-Driven phrasing tor speech synthesis in low-resource languages. In: Proc. of the ICASSP 2012. 2012. 3013-4016. [doi: 10.1109/ICASSP.2012.6288798]. 被引量：1
6Chert LZ, Gales MJF, Braunschweiler N, Akamine M, Knill K. Integrated automatic expression prediction and speech synthesis from text. In: Proc. of the ICASSP 2013. 2013. 7977-7981. [doi: 10.1109/ICASSP.2013.6639218]. 被引量：1
7Takamichi S, Toda T, Shiga Y, Sakti S, Neubig G, Nakamura S. Improvements to HMM-based speech synthesis based on parameter generation with rich context models. In: Proe. of the Interspeeeh 2013. 2013. 362-368. 被引量：1
8Holmes YN. The influence of glottal waveform on the naturalness of speech from a parallel formant synthesizer. IEEE Trans. on the Audio and Electroacoustics, 1973,21:298-305. [doi: 10.1109/TAU. 1973.1162466]. 被引量：1
9Klatt DH. Software for a cascade/parallel formant synthesizer. The Journal of the Acoustical Society of America, 1980,67:971-995. [doi: 10.1121/1.383940]. 被引量：1
10Lin CY, Jang JSR. A two-phase pitch marking method for TD-PSOLA synthesis. In: Proc. of the Interspeech 2004, Vol.1. 2004. 211-212. 被引量：1

引证文献1

1才让卓玛,李永明,才智杰.藏语语音合成单元选择[J].软件学报,2015,26(6):1409-1420. 被引量：5

二级引证文献5

1才智杰,才让卓玛,孙茂松.一种多基元联合训练的藏文词向量表示方法[J].中文信息学报,2020(5):44-49. 被引量：3
2拉巴顿珠,欧珠,祖漪清,裴春宝.藏语同形异音词的消歧方法研究[J].中文信息学报,2018,32(7):58-66. 被引量：3
3才智杰,孙茂松,才让卓玛.一种基于向量模型的藏文字拼写检查方法[J].中文信息学报,2018,32(9):47-55. 被引量：11
4拉巴顿珠,珠杰,欧珠,尼玛.端到端的藏语语音合成方法[J].应用声学,2023,42(2):324-332. 被引量：1
5都格草,才让卓玛,南措吉,算太本.基于神经网络的藏语语音合成[J].中文信息学报,2019,33(2):75-80. 被引量：10

1张巍,吴晓如,赵志伟,王仁华.基于虚拟不定长的语音库裁剪方法[J].软件学报,2006,17(5):983-990. 被引量：2
2施慧洪.我国银行Callcenter的发展、技术、功能及国际比较(二)[J].华南金融电脑,2008,16(2):7-9.
3科大讯飞人工智能主题大会暨2015年度发布会在京召开[J].机器人技术与应用,2016,0(1):9-9.
4才让卓玛,李永明,才智杰.藏语语音合成单元选择[J].软件学报,2015,26(6):1409-1420. 被引量：5
5Unix和Linux版本语音合成产品[J].通讯世界,2003,9(7):80-80.
6科大讯飞新一代语音合成系统新特性[J].中国电子商情（通信市场）,2005(07M):59-59.
7袁家宏.大规模语音语料库的采集、处理和研究[J].语言学研究,2017(1):34-42. 被引量：5
8王蓁蓁.基于测试结果调整语句出错概率方法[J].计算机工程与科学,2014,36(5):891-899.
9夏振海.SATWE程序参数选取及结果调整[J].黑龙江科技信息,2010(14):243-243.
10基于语音技术的决策指挥[J].自动识别技术与应用,2003(3):25-26.

计算机学报

2007年第11期

浏览历史

内容加载中请稍等...

语音库裁剪的一种不定长递阶聚类方法被引量：1

参考文献22

同被引文献44

引证文献1

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

语音库裁剪的一种不定长递阶聚类方法 被引量：1

参考文献22

同被引文献44

引证文献1

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

语音库裁剪的一种不定长递阶聚类方法被引量：1