摘要
针对以往进行藏文情感分析时算法忽略藏文语句结构、词序等重要信息而导致结果准确率较低的问题,将深度学习领域内的递归自编码算法引入藏文情感分析中,以更深层次提取语义情感信息。将藏文分词后,用词向量表示词语,则藏文语句变为由词向量组成的矩阵;利用无监督递归自编码算法对该矩阵向量化,此时获得的最佳藏文语句向量编码融合了语义、语序等重要信息;利用藏文语句向量和其对应的情感标签,有监督地训练输出层分类器以预测藏文语句的情感倾向。在实例验证部分,探讨了不同向量维度、重构误差系数及语料库大小对算法准确度的影响,并分析了语料库大小和模型训练时间之间的关系,指出若要快速完成模型的训练,可适当减小数据集语句条数。实例验证表明,在最佳参数组合下,所提算法准确度比传统机器学习算法中性能较好的语义空间模型高约8.6%。
During Tibetan sentiment analysis in past,the algorithm always ignores some important information like sentences structure and words order etc,which lead low accuracy of sentiment analysis.To deeply get more sentiment details,this paper proposes a novel approach of Tibetan sentiment analysis based on deep learning.Firstly,one word in Tibetan is represented by a word vector while one sentence is represented by a matrix which is composed by its word vectors;Secondly,the matrix is turned into a vector which contains most important details such as sentence meaning and words order etc,through an unsupervised recursive auto encoder algorithm;Finally,the classifier in output layer is trained by supervised method which uses the word vectors and its sentiment tags.In the experiment part,this paper discusses the selection of word vector dimensions and reconstruction error weights,studies corpus amount how to affect algorithm accuracy,and analyzes the relation between corpus amount and training time.The experimental results demonstrate that the proposed method can improve accuracy up8.6%compared with semantic space model which is almost the best in traditional machine learning algorithm.
作者
普次仁
侯佳林
刘月
翟东海
PU Ciren;HOU Jialin;LIU Yue;ZHAI Donghai(Tibetan Information Technology Research Center, Tibet University, Lhasa 850000, China;School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031, China)
出处
《计算机科学与探索》
CSCD
北大核心
2017年第7期1122-1130,共9页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金61540060
国家软科学研究计划项目2013GXS4D150
西藏自治区科技厅科学研究项目~~
关键词
深度学习
情感分析
递归自编码
递归神经网络
deep learning
sentiment analysis
recursive auto encoder
recursive neural networks