期刊文献+

简化LSTM的语音合成 被引量:4

Speech synthesis using simplified LSTM
下载PDF
导出
摘要 在增大训练数据的情况下,使用传统的隐马尔科夫模型难以提升参数化语音合成预测质量。长短期记忆神经网络学习序列内的长程特征,在大规模并行数值计算下获得更准确的语音时长和更连贯的频谱模型,但同时也包含了可简化的计算。首先分析双向长短期记忆神经网络功能结构,接着移除遗忘门和输出门,最后对文本音素信息到倒频谱的映射关系建模。在普通话语料库上的对比实验证明,简化双向长短期记忆神经网络计算量减少一半,梅尔倒频率失真度由隐马尔科夫模型的3.466 1降低到1.945 9。 Conventional parametric speech synthesis approach using hidden Markov model can hardly obtain significant improvement when trained with large scale data. As Long Short-Term Memory(LSTM)is designed to take full account of the long-term sequence features, it dynamically produces an output respecting on the input and its internal status, which brings more accuracy and smoothness in sequential prediction. However, its large computation is still tailorable. In this paper, LSTM is simplified by removing the forget gate and output gate, and then models the relationship between syllable and its cepstral on a Chinese speech data set. Both training and prediction time decrease by half while Mel cepstral distortion goes down from HMM's 3.466 1 to 1.945 9.
出处 《计算机工程与应用》 CSCD 北大核心 2018年第3期131-135,共5页 Computer Engineering and Applications
基金 国家科技支撑项目(No.2015BAH01F02)
关键词 参数化语音合成 神经网络 长短期记忆神经网络 parametric speech synthesis neural network Long Short-Term Memory(LSTM)
  • 相关文献

参考文献3

  • 1李霄寒..基于概率统计模型的说话人确认的研究[D].中国科学技术大学,2003:
  • 2梁军,柴玉梅,原慧斌,高明磊,昝红英.基于极性转移和LSTM递归网络的情感分析[J].中文信息学报,2015,29(5):152-159. 被引量:91
  • 3卫晓欣..基于长短型记忆递归神经网络的英文手写识别[D].华南理工大学,2014:

二级参考文献17

  • 1Bengio Y, Ducharme R, Vincent P, et al. A neural probabilistic language model. The Journal of Ma- chine Learning Research, 2003, 3; 1137-1155. 被引量:1
  • 2Mikolov T, Karaficit M, Burget L, et al. Recurrent neural network based language model[C]//Proceed- ings of the llth Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010. 2010. 1045-1048. 被引量:1
  • 3Socher R, Pennington J, Huang E H, et al. Semi-su- pervised recursive autoencoders for predicting senti- ment distributions[C]//Proeeedings of the Conference on Empirical Methods in Natural Language Process- ing. Association for Computational Linguistics, 2011:151-161. 被引量:1
  • 4Hochreiter S, Bengio Y, Frasconi P, et al. Gradient flow in recurrent nets: the difficulty of learning long- term dependencies M. Wiley-IEEE Press, 2001: 237-243. 被引量:1
  • 5Hochreiter S, Schmidhuber J. Long short-term memo- ry. Neural computation, 1997, 9(8): 1735-1780. 被引量:1
  • 6Socher R, Lin C C, Manning C, et al. Parsing natural scenes and natural language with recursive neural net- works[C//Proceedings of the 28th international con- ference on machine learning (ICML-11). 2011 : 129- 136. 被引量:1
  • 7Socher R, Perelygin A, Wu J Y, et al. Recursive deep models for semantic compositionality over a sentiment treebankC//Proceedings of the conference on empiri- cal methods in natural language processing (EMNLP). 2013 : 1631-1642. 被引量:1
  • 8Irsoy O, Cardie C. Deep Recursive Neural Networks for Compositionality in Language[-C//Proeeedings of the Advances in Neural Information Processing Sys- tems. 2014:2096 -2104. 被引量:1
  • 9Li P, Liu Y, Sun M. Recursive Autoencoders for ITG-Based Translation[C]//Proceedings of the EMN- LP. 2013: 567-577. 被引量:1
  • 10Le P, Zuidema W. Inside-Outside Semantics: A Framework for Neural Models of Semantic Composi tlon[C]//Proceeding of the Deep Learning and Rep- resentation Learning Workshop: NIPS 2014. 被引量:1

共引文献90

同被引文献25

引证文献4

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部