Directing to the weakness of the present fixed values mapping methods (method_F), a vocal tract system conversion method based on the universal background model (UBM) is proposed for improving the performance of t...Directing to the weakness of the present fixed values mapping methods (method_F), a vocal tract system conversion method based on the universal background model (UBM) is proposed for improving the performance of the speech conversion system from Chinese whis- pered speech to normal speech. For the numerous components of UBM, the errors produced by the acoustical probability density statistical model can't be ignored. Thus an effective Gaus- sian mixture components chosen method based on the posterior probability summation of the minimum spectral distortion is developed to optimizing the system performance. The proposed method (method_U) is analyzed and compared using the performance index (PI) based on Itakura-Saito spectral distortion measure. It is shown experimentally that the performance of method_U is more stability for different speakers and different phonemes than that of method_F. The average PI of method_U is better than method_F. It is shown that by selecting effective Gaussian mixture components, the PI of method_U can be further improved 5.11%. Subjective auditory tests also show that the proposed method can improve the definition and intelligibility of conversion speech.展开更多
This investigation was designed to approach a novel estimation method of glottal vocal efficiency (GVE) based on conversion function of voice source. The conversion function of voice source was defined the ratio of su...This investigation was designed to approach a novel estimation method of glottal vocal efficiency (GVE) based on conversion function of voice source. The conversion function of voice source was defined the ratio of supra-glottal acoustic voice source signal to the glottal air volume flow velocity waveform in frequency domain. A carefully designed in vivo canine larynx experiment and several human experiments including different vowels, pressed, falsetto, breath and typical laryngeal diseases were adopted to demonstrate this alternative GVE method. Compared with other vocal efficiency, it is shown that this method could eliminate the contribution from the super vocal tract transmission and resonance to GVE, and reflect the differences of phonation modes. The average magnitude of this conversion function in frequency domain represents GVE, and the variation of the magnitude in fundamental frequency is identical to AC/DC value.展开更多
基金supported by the National Natural Science Foundation of China(61071215)the Science and Technology Foundation of Suzhou(SYG201033)the Pre-research Foundation of Soochow University(Q311901111,14317399)
文摘Directing to the weakness of the present fixed values mapping methods (method_F), a vocal tract system conversion method based on the universal background model (UBM) is proposed for improving the performance of the speech conversion system from Chinese whis- pered speech to normal speech. For the numerous components of UBM, the errors produced by the acoustical probability density statistical model can't be ignored. Thus an effective Gaus- sian mixture components chosen method based on the posterior probability summation of the minimum spectral distortion is developed to optimizing the system performance. The proposed method (method_U) is analyzed and compared using the performance index (PI) based on Itakura-Saito spectral distortion measure. It is shown experimentally that the performance of method_U is more stability for different speakers and different phonemes than that of method_F. The average PI of method_U is better than method_F. It is shown that by selecting effective Gaussian mixture components, the PI of method_U can be further improved 5.11%. Subjective auditory tests also show that the proposed method can improve the definition and intelligibility of conversion speech.
基金This Project was supported bythe National Natural Science Foundation ofChina and under grantsNo.69925101 and No.69871023.
文摘This investigation was designed to approach a novel estimation method of glottal vocal efficiency (GVE) based on conversion function of voice source. The conversion function of voice source was defined the ratio of supra-glottal acoustic voice source signal to the glottal air volume flow velocity waveform in frequency domain. A carefully designed in vivo canine larynx experiment and several human experiments including different vowels, pressed, falsetto, breath and typical laryngeal diseases were adopted to demonstrate this alternative GVE method. Compared with other vocal efficiency, it is shown that this method could eliminate the contribution from the super vocal tract transmission and resonance to GVE, and reflect the differences of phonation modes. The average magnitude of this conversion function in frequency domain represents GVE, and the variation of the magnitude in fundamental frequency is identical to AC/DC value.