摘要
为了提高微博的情感分析的准确率,选取微博文本中的动词和形容词作为特征,提出了基于层次结构的特征降维方法,采用基于表情符号的方法计算特征极性值。在此基础上,提出了基于特征极性值的位置权重计算方法,借助支持向量机(SVM)作为机器学习模型将微博文本分为正面、负面和中性3类。也就是多特征提取,结合字典法与机器学习法2种算法,来提高情感分析的准确率。实验结果表明,该方法能取得平均为72.16%的准确率。提出的基于多特征与复合分类器的情感分析方法能够比较有效地对中文微博文本进行情感分类。
In order to improve the accuracy of sentiment analysis, verbs and adjectives in micro- blog texts are selected as features and a hierarchical structure-based approach to the decline of feature dimension is put forward. The method based on the emoticon is designed to calculate the feature polarity. On this basis, the position weight calculation method based on the feature polarity is proposed. Then the micro-blog texts are classified into three categories including positive, negative and neutral one by SVM. By combining Lexicon-based and SVM Machine Learning method, better accuracy of classification can be achieved. Experimental results show that the approach proposedto the sentiment classification of Chinese micro-blog is effective.
出处
《北京信息科技大学学报(自然科学版)》
2013年第4期39-45,共7页
Journal of Beijing Information Science and Technology University
基金
国家自然科学基金项目资助(61171159
61271304)
关键词
微博
表情符号
复合分类法
位置权重
情感分类
micro-blog
emoticon picture
combined classification
position weight
sentiment classification