期刊文献+

语音库裁剪的一种不定长递阶聚类方法 被引量:1

A Non-Uniform Clustering Synthesis Instances Pruning Approach for Corpus-Based TTS
下载PDF
导出
摘要 大量使用不定长是大语料库语音合成质量的一个重要保证,而语音库裁剪方法通常会导致不定长的损失.针对这一关键性问题,该文构建了NuClustering-VPA算法:对不同粒度的不定长变体进行聚类,根据高阶聚类结果调整低阶变体的聚类,使得低阶聚类中心有所偏向.NuClustering-VPA算法保留了最重要的不定长,从而有效减小了裁剪对不定长的破坏.测听实验表明,利用NuClustering-VPA算法,即使在语音库裁减率为39.63%时,合成自然度下降较小,仍然保持在较高的水平.这一技术已被应用在科大讯飞公司的实际语音产品中. The employment of non-uniform does great help for Corpus-based TTS to synthesize high natural speech. But Tailoring TTS voice font, or pruning redundant synthesis instances, usually results in loss of non-uniform. In order to solve this problem, this paper proposes the algorithm named NuClustering-VPA. According to this algorithm, the high level non-uniforms containing same syllables are clustered to several centers, then the centers are projected to low level non-uniforms. Therefore, the centerrs projections can guide the clustering of low level nonuniforms. These series of processes avoid erasing or destroying those key non-uniforms for synthesis. In experiments, the naturalness scored by MOS does not severely degrade when reduction rate is above 39.63%. And this approach has been applied in software products of Ifytek Co. Ltd.
出处 《计算机学报》 EI CSCD 北大核心 2007年第11期2017-2024,共8页 Chinese Journal of Computers
基金 国家自然科学基金(60602017) 国家"八六三"高技术研究发展计划项目基金(2004AA114030)资助.~~
关键词 基于语料库的语音合成 语音库裁剪 语音库去冗余 可伸缩语音合成系统 Corpus-based TTS Tailoring TTS voice font pruning redundant synthesis instances scalable TTS
  • 相关文献

参考文献22

  • 1Hunt A,Black A.Unit selection in a concatenative speech synthesis system using a large speech database//Proceedings of the ICASSP1996.1996,1:373-376 被引量:1
  • 2Sagisaka Y,Kaiki N,Iwahashi N,Mimura K.ATR-v-TALK speech synthesis system//Proceedings of the ICSLP 1992.1992,1:483-486 被引量:1
  • 3刘庆峰..基于听感量化的语音合成研究[D].中国科学技术大学,2003:
  • 4Chu M,Peng H,Yang H,Chang E.Selection non-uniform units from a very large corpus for concatenative speech synthesizer//Proceedings of the ICASSP2001.2001 被引量:1
  • 5Rabiner L R.A tutorial on hidden markov models and selected application in speech recognition.IEEE Proceedings,1989,77(2):257-285 被引量:1
  • 6Breiman L,Friedman J,Olsen R,Stone C.Classification and Regression Trees.Pacific Grove,CA:Wadsworth & Brooks,1984 被引量:1
  • 7Black A W,Taylor P A.Automatically clustering similar units for units selection in speech synthesis//Proceedings of the Eurospeech1997.1997,2:601-604 被引量:1
  • 8Hon H,Acero A,Huang X,Liu J,Plumpe M.Automatic generation of synthesis units for trainable text-to-speech systems//Proceedings of the ICASSP1998.1998,1:293-296 被引量:1
  • 9Kim S H,Lee Y L,Hirose K.Pruning of redundant synthesis instances based on weight vector quantization//Proceedings of the Eurospeech2001.2001:2231-2234 被引量:1
  • 10Kim S H,Lee Y L,Hirose K.Unit generation based on phrase break strength and pruing for corpus-based text-tospeech.ETRI Journal,2001,23(4):168-176 被引量:1

同被引文献44

  • 1孔江平.藏语(拉萨话)声调感知研究[J].民族语文,1995(3):56-64. 被引量:42
  • 2张巍,吴晓如,赵志伟,王仁华.基于虚拟不定长的语音库裁剪方法[J].软件学报,2006,17(5):983-990. 被引量:2
  • 3冯哲,孙吉贵,张长胜,王岩.汉语语音合成的研究进展[J].吉林大学学报(信息科学版),2007,25(2):198-206. 被引量:7
  • 4Rouse M. Speech synthesis definition, http://whatis.techtarget.com/definition/speech-synthesis. 被引量:1
  • 5Parlikar A, Black AW. Data-Driven phrasing tor speech synthesis in low-resource languages. In: Proc. of the ICASSP 2012. 2012. 3013-4016. [doi: 10.1109/ICASSP.2012.6288798]. 被引量:1
  • 6Chert LZ, Gales MJF, Braunschweiler N, Akamine M, Knill K. Integrated automatic expression prediction and speech synthesis from text. In: Proc. of the ICASSP 2013. 2013. 7977-7981. [doi: 10.1109/ICASSP.2013.6639218]. 被引量:1
  • 7Takamichi S, Toda T, Shiga Y, Sakti S, Neubig G, Nakamura S. Improvements to HMM-based speech synthesis based on parameter generation with rich context models. In: Proe. of the Interspeeeh 2013. 2013. 362-368. 被引量:1
  • 8Holmes YN. The influence of glottal waveform on the naturalness of speech from a parallel formant synthesizer. IEEE Trans. on the Audio and Electroacoustics, 1973,21:298-305. [doi: 10.1109/TAU. 1973.1162466]. 被引量:1
  • 9Klatt DH. Software for a cascade/parallel formant synthesizer. The Journal of the Acoustical Society of America, 1980,67:971-995. [doi: 10.1121/1.383940]. 被引量:1
  • 10Lin CY, Jang JSR. A two-phase pitch marking method for TD-PSOLA synthesis. In: Proc. of the Interspeech 2004, Vol.1. 2004. 211-212. 被引量:1

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部