摘要
大量使用不定长是大语料库语音合成质量的一个重要保证,而语音库裁剪方法通常会导致不定长的损失.针对这一关键性问题,该文构建了NuClustering-VPA算法:对不同粒度的不定长变体进行聚类,根据高阶聚类结果调整低阶变体的聚类,使得低阶聚类中心有所偏向.NuClustering-VPA算法保留了最重要的不定长,从而有效减小了裁剪对不定长的破坏.测听实验表明,利用NuClustering-VPA算法,即使在语音库裁减率为39.63%时,合成自然度下降较小,仍然保持在较高的水平.这一技术已被应用在科大讯飞公司的实际语音产品中.
The employment of non-uniform does great help for Corpus-based TTS to synthesize high natural speech. But Tailoring TTS voice font, or pruning redundant synthesis instances, usually results in loss of non-uniform. In order to solve this problem, this paper proposes the algorithm named NuClustering-VPA. According to this algorithm, the high level non-uniforms containing same syllables are clustered to several centers, then the centers are projected to low level non-uniforms. Therefore, the centerrs projections can guide the clustering of low level nonuniforms. These series of processes avoid erasing or destroying those key non-uniforms for synthesis. In experiments, the naturalness scored by MOS does not severely degrade when reduction rate is above 39.63%. And this approach has been applied in software products of Ifytek Co. Ltd.
出处
《计算机学报》
EI
CSCD
北大核心
2007年第11期2017-2024,共8页
Chinese Journal of Computers
基金
国家自然科学基金(60602017)
国家"八六三"高技术研究发展计划项目基金(2004AA114030)资助.~~