摘要
为有效减少模型训练参数和训练时间,提高孤立词语音识别正确率,提出将卷积神经网络应用到语音识别中的方法。该网络中的局部感知野、权值共享与池化等特殊结构,能够在保证识别性能的前提下,极大地压缩训练模型的尺寸,深入分析卷积层卷积器个数与尺寸和池化层池化参数对识别结果的影响情况;经过动态时间规整网络,将发音单元不同长度帧的特征参数规整到同一帧数,输入到网络中进行语音识别。在自建库上的实验结果表明,相比传统的深度神经网络,卷积神经网络的语音识别正确率有12%的提升,是一种优良的语音识别模型。
To reduce the model training parameters and training time effectively and to improve the speech recognition rate of isolated words, convolutional neural network was proposed to apply to speech recognition. The special structure of local perception field, weight sharing and pooling in the network greatly reduced the size of the training model on the premise of ensuring the re- cognition performance, and the influence of the number and size of convolver of convolutional layers and the pooling parameters of pooling layers on the recognition results were deeply analyzed. After the dynamic time warping network, the characteristic parameters of different length frames of the pronunciation unit were normalized to the same number of frames and were input into the network for speech recognition. Experimental results on self-built databases show that compared with the traditional deep neural network, the accuracy of speech recognition of convolutional neural networks is improved by 12%, which is an excellent speech recognition model.
作者
侯一民
李永平
HOU Yi-min;LI Yong-ping(School of Automation Engineering,Northeast Electric Power University,Jilin 132012,China)
出处
《计算机工程与设计》
北大核心
2019年第6期1751-1756,共6页
Computer Engineering and Design
基金
吉林省科技发展计划基金项目(20150414051GH)
关键词
卷积神经网络
语音识别
局部感知野
权值共享
池化
convolutional neural networks
speech recognition
local perception
weight sharing
pooling