摘要
针对传统LDA主题模型无法体现词与词之间的顺序及关联性这一不足,提出一种改进的加权W-LDA情感分类方法.首先,在该模型的主题采样及其分布期望计算过程中引入平均加权值,以此避免与主题紧密相关词被高频词所淹没,从而提高主题间的区分度;然后,以提取到的高质量文档-主题分布及主题-词向量为基础,引入支持向量机算法(SVM),构建一个集有情感词分析与提取、主题分布计算与情感分类功能的文本语料情感分析方法;最后,利用真实的教学评价数据和公共评论集对本文方法的有效性进行了验证.结果表明,本文提出的方法在主题区分度、分类准确率以及F1-Measure方面均明显优于SVM算法和文献[15]中的算法.
An improved weighted W-LDA emotional classification method is proposed to solve the problem that the traditional LDA topic model can not reflect the order and relevance among words.Firstly,the average weighted value is used in the theme sampling and distribution expectation calculation process of the model, which avoid some important words related to the theme were drowned by high-frequency words.So these measures contribute to improve the degree of descrimination among the subjects.Secondly,based on the extracted high-quality document-subject distribution and theme-word vector,with the support vector machine algorithm (SVM)involved,a emotion classification method on comentary corpus is proposed in this article.Its functions include the analysis and exaction of emotion words,the topic distribution computation and emotion classifiction.Finally,some experiments are perfomed on the real teaching evaluation data and public comment data.The experimental results show that the proposed method has many advantages over the classific SVM and literatur [15]for the degree of descrimination the topics,the classification accuracy and F1-Measure.
作者
郭晓慧
GUO Xiaohui(Institute of Information Engineering,Yango University,Fuzhou 350015,China)
出处
《延边大学学报(自然科学版)》
CAS
2018年第3期266-273,共8页
Journal of Yanbian University(Natural Science Edition)
基金
福建省教育厅科研项目(JA15631)
关键词
评论语料
LDA主题模型
支持向量机
情感分类
commentary corpus
LDA topic model
support vector machine
emotion classification