This paper describes the latest version of the Chinese-Japanese-English handheld speech-tospeech translation system developed by NICT/ATR, which is now ready to be deployed for travelers. With the entire speech-to-spe...This paper describes the latest version of the Chinese-Japanese-English handheld speech-tospeech translation system developed by NICT/ATR, which is now ready to be deployed for travelers. With the entire speech-to-speech translation function being implemented into one terminal, it realizes real-time, location-free speech-to-speech translation. A new noise-suppression technique notably improves the speech recognition performance. Corpus-based approaches of speech recognition, machine translation, and speech synthesis enable coverage of a wide variety of topics and portability to other languages. Test results show that the character accuracy of speech recognition is 82%-94% for Chinese speech, with a bilingual evaluation understudy score of machine translation is 0.55-0.74 for Chinese-Japanese and Chinese-English展开更多
A Long Short-Term Memory(LSTM) Recurrent Neural Network(RNN) has driven tremendous improvements on an acoustic model based on Gaussian Mixture Model(GMM). However, these models based on a hybrid method require a force...A Long Short-Term Memory(LSTM) Recurrent Neural Network(RNN) has driven tremendous improvements on an acoustic model based on Gaussian Mixture Model(GMM). However, these models based on a hybrid method require a forced aligned Hidden Markov Model(HMM) state sequence obtained from the GMM-based acoustic model. Therefore, it requires a long computation time for training both the GMM-based acoustic model and a deep learning-based acoustic model. In order to solve this problem, an acoustic model using CTC algorithm is proposed. CTC algorithm does not require the GMM-based acoustic model because it does not use the forced aligned HMM state sequence. However, previous works on a LSTM RNN-based acoustic model using CTC used a small-scale training corpus. In this paper, the LSTM RNN-based acoustic model using CTC is trained on a large-scale training corpus and its performance is evaluated. The implemented acoustic model has a performance of 6.18% and 15.01% in terms of Word Error Rate(WER) for clean speech and noisy speech, respectively. This is similar to a performance of the acoustic model based on the hybrid method.展开更多
文摘This paper describes the latest version of the Chinese-Japanese-English handheld speech-tospeech translation system developed by NICT/ATR, which is now ready to be deployed for travelers. With the entire speech-to-speech translation function being implemented into one terminal, it realizes real-time, location-free speech-to-speech translation. A new noise-suppression technique notably improves the speech recognition performance. Corpus-based approaches of speech recognition, machine translation, and speech synthesis enable coverage of a wide variety of topics and portability to other languages. Test results show that the character accuracy of speech recognition is 82%-94% for Chinese speech, with a bilingual evaluation understudy score of machine translation is 0.55-0.74 for Chinese-Japanese and Chinese-English
基金supported by the Ministry of Trade,Industry & Energy(MOTIE,Korea) under Industrial Technology Innovation Program (No.10063424,'development of distant speech recognition and multi-task dialog processing technologies for in-door conversational robots')
文摘A Long Short-Term Memory(LSTM) Recurrent Neural Network(RNN) has driven tremendous improvements on an acoustic model based on Gaussian Mixture Model(GMM). However, these models based on a hybrid method require a forced aligned Hidden Markov Model(HMM) state sequence obtained from the GMM-based acoustic model. Therefore, it requires a long computation time for training both the GMM-based acoustic model and a deep learning-based acoustic model. In order to solve this problem, an acoustic model using CTC algorithm is proposed. CTC algorithm does not require the GMM-based acoustic model because it does not use the forced aligned HMM state sequence. However, previous works on a LSTM RNN-based acoustic model using CTC used a small-scale training corpus. In this paper, the LSTM RNN-based acoustic model using CTC is trained on a large-scale training corpus and its performance is evaluated. The implemented acoustic model has a performance of 6.18% and 15.01% in terms of Word Error Rate(WER) for clean speech and noisy speech, respectively. This is similar to a performance of the acoustic model based on the hybrid method.