This paper proposes a technique for synthesizing a pixel-based photo-realistic talking face animation using two-step synthesis with HMMs and DNNs. We introduce facial expression parameters as an intermediate represent...This paper proposes a technique for synthesizing a pixel-based photo-realistic talking face animation using two-step synthesis with HMMs and DNNs. We introduce facial expression parameters as an intermediate representation that has a good correspondence with both of the input contexts and the output pixel data of face images. The sequences of the facial expression parameters are modeled using context-dependent HMMs with static and dynamic features. The mapping from the expression parameters to the target pixel images are trained using DNNs. We examine the required amount of the training data for HMMs and DNNs and compare the performance of the proposed technique with the conventional PCA-based technique through objective and subjective evaluation experiments.展开更多
In this study,vector quantization and hidden Markov models were used to achieve speech command recognition.Pre-emphasis,a hamming window,and Mel-frequency cepstral coefficients were first adopted to obtain feature val...In this study,vector quantization and hidden Markov models were used to achieve speech command recognition.Pre-emphasis,a hamming window,and Mel-frequency cepstral coefficients were first adopted to obtain feature values.Subsequently,vector quantization and HMMs(hidden Markov models)were employed to achieve speech command recognition.The recorded speech length was three Chinese characters,which were used to test the method.Five phrases pronounced mixing various human voices were recorded and used to test the models.The recorded phrases were then used for speech command recognition to demonstrate whether the experiment results were satisfactory.展开更多
In this paper the authors look into the problem of Hidden Markov Models (HMM): the evaluation, the decoding and the learning problem. The authors have explored an approach to increase the effectiveness of HMM in th...In this paper the authors look into the problem of Hidden Markov Models (HMM): the evaluation, the decoding and the learning problem. The authors have explored an approach to increase the effectiveness of HMM in the speech recognition field. Although hidden Markov modeling has significantly improved the performance of current speech-recognition systems, the general problem of completely fluent speaker-independent speech recognition is still far from being solved. For example, there is no system which is capable of reliably recognizing unconstrained conversational speech. Also, there does not exist a good way to infer the language structure from a limited corpus of spoken sentences statistically. Therefore, the authors want to provide an overview of the theory of HMM, discuss the role of statistical methods, and point out a range of theoretical and practical issues that deserve attention and are necessary to understand so as to further advance research in the field of speech recognition.展开更多
为了不断发展和完善我国的社区服务体系,让信息技术逐步应用于社区服务,项目组设计实现了基于HTK的社区语音接入服务平台。HTK(Hidden Markov Model Toolkit)是一个基于隐马尔可夫模型(HMMs)的语音处理工具,在语音识别领域处于国际领先...为了不断发展和完善我国的社区服务体系,让信息技术逐步应用于社区服务,项目组设计实现了基于HTK的社区语音接入服务平台。HTK(Hidden Markov Model Toolkit)是一个基于隐马尔可夫模型(HMMs)的语音处理工具,在语音识别领域处于国际领先水平。平台由最初的高斯单音素HMMs模型,哑音素HMMs模型经训练得到了输出分布更加稳健的三音素HMMs模型。在非特定人低噪音环境下单词识别率达到93.22%,整句识别率达到80.50%,取得了良好的识别效果。展开更多
文摘This paper proposes a technique for synthesizing a pixel-based photo-realistic talking face animation using two-step synthesis with HMMs and DNNs. We introduce facial expression parameters as an intermediate representation that has a good correspondence with both of the input contexts and the output pixel data of face images. The sequences of the facial expression parameters are modeled using context-dependent HMMs with static and dynamic features. The mapping from the expression parameters to the target pixel images are trained using DNNs. We examine the required amount of the training data for HMMs and DNNs and compare the performance of the proposed technique with the conventional PCA-based technique through objective and subjective evaluation experiments.
基金This research work was supported by the Ministry of Science and Technology of the Republic of China under contract MOST 108-2221-E-390-018.
文摘In this study,vector quantization and hidden Markov models were used to achieve speech command recognition.Pre-emphasis,a hamming window,and Mel-frequency cepstral coefficients were first adopted to obtain feature values.Subsequently,vector quantization and HMMs(hidden Markov models)were employed to achieve speech command recognition.The recorded speech length was three Chinese characters,which were used to test the method.Five phrases pronounced mixing various human voices were recorded and used to test the models.The recorded phrases were then used for speech command recognition to demonstrate whether the experiment results were satisfactory.
文摘In this paper the authors look into the problem of Hidden Markov Models (HMM): the evaluation, the decoding and the learning problem. The authors have explored an approach to increase the effectiveness of HMM in the speech recognition field. Although hidden Markov modeling has significantly improved the performance of current speech-recognition systems, the general problem of completely fluent speaker-independent speech recognition is still far from being solved. For example, there is no system which is capable of reliably recognizing unconstrained conversational speech. Also, there does not exist a good way to infer the language structure from a limited corpus of spoken sentences statistically. Therefore, the authors want to provide an overview of the theory of HMM, discuss the role of statistical methods, and point out a range of theoretical and practical issues that deserve attention and are necessary to understand so as to further advance research in the field of speech recognition.
文摘为了不断发展和完善我国的社区服务体系,让信息技术逐步应用于社区服务,项目组设计实现了基于HTK的社区语音接入服务平台。HTK(Hidden Markov Model Toolkit)是一个基于隐马尔可夫模型(HMMs)的语音处理工具,在语音识别领域处于国际领先水平。平台由最初的高斯单音素HMMs模型,哑音素HMMs模型经训练得到了输出分布更加稳健的三音素HMMs模型。在非特定人低噪音环境下单词识别率达到93.22%,整句识别率达到80.50%,取得了良好的识别效果。