摘要
微博情感分析已成为目前研究的热点,对于企业营销策划、产品反馈分析、舆情检测、竞争情报挖掘等具有十分重要的作用。微博情感分析通常包含观点句识别、情感要素抽取以及观点分类等一系列工作。由于情感倾向主要通过文本中的观点句来表达,因此观点句识别是影响微博情感分析效果的决定性因素。本论文针对微博观点句识别问题,提出了一种基于新词扩充和特征选择的观点句识别新方法。该方法首先基于微博表情符号和新浪微博实际数据对情感词典进行了扩充,同合并词项的方法将网络新词扩充到分词集合中以提高分词准确率,并进一步融合微博特有特征和情感词、文法、句法、主题等传统特征,使用SVM分类方法进行观点句识别。在来自腾讯微博的20个主题45566条真实微博上的实验表明,我们的方法具有较好的准确率和F测试值。
Microblog sentiment analysis has been one of the hottest topics in recent years, as it plays important roles in enterprise marketing planning, products feedback analysis, public feelings detection, and competitive intelligence mining. Generally, microblog sentiment analysis consists of several processes, including opinionated sentences recognition, sentiment factors extraction, and opinion classification, among which the opinionated sentences recognition has the crucial impact on the performance of microblog sentiment analysis, as sentiment is usually expressed through the opinionated sentences. Focusing on the detection of the opinionated sentences from microblog, in this paper we present a new approach to recognize opinionated sentences, which is based on new words extension and features selection. We first extend the sentiment dictionary by analyzing the expressional signals and a real microblog dataset from Sina Weibo. Next, we employ a word combination method to introduce fresh words into the segmented words list and therefore improve the accuracy of word segmentation. Finally, we fuse the microblog-specific features with traditional features such as sentiment word, n-gram, syntax, and topic, and use SVM to recognize opinionated sentences. We conduct experiments on a real microblog data set from Tencent including 20 topics and 45 566 microblogs show that our proposed method has a good precision and F-measure value.
出处
《情报学报》
CSSCI
北大核心
2013年第9期945-951,共7页
Journal of the China Society for Scientific and Technical Information
基金
国家自然科学基金面上项目“基于时空语义的微博突发事件检测与短期预测研究”(编号71273010)
安徽省自然科学基金面上项目(编号1208085MG117)资助
关键词
微博
情感分析
观点句识别
特征融合
microblog, sentiment analysis, opinionated sentence recognition, features fusion