Modeling Pitch Contour of Chinese Mandarin Sentences with the PENTA Model 被引量：1

Modeling Pitch Contour of Chinese Mandarin Sentences with the PENTA Model

导出

摘要 In continuous speech, the pitch contour of the same syllable may vary much due to its contextual information. The Parallel Encoding and Target Approximation （PENTA） model is applied here to Mandarin speech synthesis with a method to predict pitch contours for Chinese syllables with different contexts by combining the Classification And Regression Tree （CART） with the PENTA model to improve its prediction accuracy. CART was first used to cluster the syllables＇ normalized pitch contours according to the syllables contextual information and the distances between pitch contours. The average pitch contour was used to train the PENTA model with the average contour for each cluster. The initial pitch is required with the PENTA model to predict a continuous pitch contour. A Pitch Discontinuity Model （PDM） was used to predict the initial pitches at positions with voiceless consonants and prosodic boundaries. Initial tests on a Chinese four-syllable word corpus containing 2048 words were extended to tests with a continuous speech corpus containing 5445 sentences. The results are satisfactory in terms of the Root Mean Square Error （RMSE） comparing the predicted pitch contour with the original contour. This method can model pitch contours for Mandarin sentences with any text for speech synthesis. In continuous speech, the pitch contour of the same syllable may vary much due to its contextual information. The Parallel Encoding and Target Approximation （PENTA） model is applied here to Mandarin speech synthesis with a method to predict pitch contours for Chinese syllables with different contexts by combining the Classification And Regression Tree （CART） with the PENTA model to improve its prediction accuracy. CART was first used to cluster the syllables＇ normalized pitch contours according to the syllables contextual information and the distances between pitch contours. The average pitch contour was used to train the PENTA model with the average contour for each cluster. The initial pitch is required with the PENTA model to predict a continuous pitch contour. A Pitch Discontinuity Model （PDM） was used to predict the initial pitches at positions with voiceless consonants and prosodic boundaries. Initial tests on a Chinese four-syllable word corpus containing 2048 words were extended to tests with a continuous speech corpus containing 5445 sentences. The results are satisfactory in terms of the Root Mean Square Error （RMSE） comparing the predicted pitch contour with the original contour. This method can model pitch contours for Mandarin sentences with any text for speech synthesis.

作者 Hui Pang Zhiyong Wu Lianhong Cai

机构地区 Tsinghua-CUHK Joint Research Center for Media Sciences

出处《Tsinghua Science and Technology》 EI CAS 2012年第2期218-224,共7页 清华大学学报（自然科学版（英文版）

基金 Supported by the National Natural Science Foundation of China (Nos.60805008,60928005,and 61003094) the Ph.D.Programs Foundation of the Ministry of Education of China (No.200800031015)

关键词 speech synthesis PENTA model prosody analysis prosody modeling speech synthesis PENTA model prosody analysis prosody modeling

分类号 TN912.33 [电子电信—通信与信息系统] TQ052.5 [电子电信—信息与通信工程]

引文网络
相关文献

参考文献1

1谌卫军,林福宗,李建民,张钹.基于CART技术的汉语韵律短语分析[J].计算机科学,2002,29(4):50-52. 被引量：1

二级参考文献12

1Ostendorf M, Wightman C, Veilleux N. Parse scoring with prosodic information: an analysis/synthesis approach [J]. Computer Speech and Language, 1993, 7: 193～210 被引量：1
2Bachenko J, Fitzpatrick E A. Computational grammar of discourse-neutral prosodic phrasing in English [J ] Computational Linguistics, 1990,16:155～ 170 被引量：1
3Quene H, Kager R. The derivation of prosody for text- to-speech from prosodic sentence structure[J]. Computer Speech and 1anguage, 1992,6:77～98 被引量：1
4Taylor P, Black A W. Assigning phrase breaks from part-ofspeech sequences[J]. Computer Speech and Language, 1998, 12:99～117 被引量：1
5Muller A F, Zimmermann H G, Neuneier R. Robust generation of symbolic prosody by a neural classifier based on autoassociators [A]. In: Proc. of Intl. Conf. on Acoust, Speech and .Signal Processing, 1996. 1285～1288 被引量：1
6Wang M Q, Hirschberg J. Automatic classification of intonational phrase boundaries[J]. Computer Speech and Language, 1992, 6:175～196 被引量：1
7Hirschberg J, Prieto P. Training intonational phrasing rules automatically for English and Spanish text-to-speech [J]. Speech Communication, 1996, 18:281～290 被引量：1
8Lee S, Oh Y H. Tree-based modeling of prosodic phrasing and segmental duration for Korean TTS systems [J]. Speech Communication, 1999, 28:283～300 被引量：1
9Chou F C,Tseng C Y,Chen K J,et al. Automatic generation of prosodic structure for high quality mandarin speech synthesis[A].Proc ICASSP, 1997. 1624～1627 被引量：1
10Breiman L, Friedman J, Olshen R, et al. Classification and Regression Trees [M]. Belmont, CA: Wadsworth. 1984 被引量：1