摘要
目的:提高中文微博情绪分析的性能。方法:根据现有的情感资料构建了包含情感信息、情绪信息、词性信息的细粒度微博情感词典,将其与大规模文本预训练得到的词向量融合在一起构成情绪词向量。针对类别不平衡问题使用过采样方法来平衡样本,采用注意力机制构建微博文本和情绪词的语义表示,然后使用卷积神经网络模型提取特征,最后对微博文本进行情绪分类。结果:通过自然语言处理与中文计算会议(NLPCC)微博情绪分析公共数据集进行评测,与传统方法相比,该方法在宏平均、微平均和F值指标上均有提升。结论:使用CNN和注意力机制相结合的方法能够明显提升微博情绪分析任务的性能。
Aims:This paper studies the way to improve the performance of Chinese Weibo sentiment analysis.Methods:A fine-grained micro-blog emotion dictionary containing emotion information,sentiment information,and part-of-speech information was constructed based on the existing sentiment data;and it was combined with vectors obtained from large-scale text pre-training to form sentiment word vectors.For the problem of category imbalance,the oversampling method was used to balance the samples.The semantic mechanism of the microblog text and emotion words was constructed by the attention mechanism.Then the convolutional neural network model was used to extract features;and finally the emotion classification of the microblog text was established.Results:Compared with the traditional method,this method has been improved on the macro average,micro average and F value indicators by using natural language processing and the public data set of the Chinese Computing Conference(NLPCC)Weibo sentiment analysis.Conclusions:The combination of CNN and the attention mechanism can significantly improve the performance of Weibo sentiment analysis tasks.
作者
陈欣
杨小兵
姚雨虹
CHEN Xin;YANG Xiaobing;YAO Yuhong(College of Information Engineering,China Jiliang University,Hangzhou 310018,China)
出处
《中国计量大学学报》
2020年第3期370-377,共8页
Journal of China University of Metrology
基金
国家自然科学基金项目(No.61303146)。
关键词
计量
情绪分析
卷积神经网络
情感词典
样本平衡
metrology
sentiment analysis
convolutional neural network
emotion dictionary
sample balance