摘要
【目的】提出一种基于情感分析技术自动识别特定领域谣言的方法。【方法】界定高、低质量信息源,在假设高质量信息源信息更可靠的情况下,通过基于情感词典的情感分析方法,量化高质量信息源与低质量信息源对特定对象的情感差异,判定低质量信息源提供的信息是否属于谣言。【结果】将该方法应用于"食品养生"、"医学健康"两个领域进行谣言识别。在30个疑似谣言案例中准确识别出23个谣言案例,准确率为76.67%。本文提出的谣言识别方法在谣言预测方面的F值为83.34%,查全率为71.42%,查准率为100%;在非谣言文本预测上的F值为72.73%,查全率为100%,查准率为57.14%。【局限】未实现不同信息源数据自动抽取,每个谣言案例下的人工收集的谣言数量有限。【结论】本文基于情感分析的谣言识别方法对特定类型的谣言是有效的。
[Objective] This paper aims to identify rumors automatically with the help of sentiment analysis. [Methods] First, we chose high-quality and low-quality information sources. Then, we calculated the sentiment value and difference between the information from different sources. Based on the assumption that the information from high-quality source was more reliable, information from low-quality channels could be listed as rumor if the sentiment difference between them exceeded the pre-set threshold. [Results] We applied the proposed method to information on food and health as well as health and medical issues, and then successfully identified twenty-three rumors from thirty suspected cases. The accuracy rate of rumor detection was 76.67%, the F-value was 83.34%, the recall and precision was 71,42% and 100%, respectively. For non-rumor message, the F-value, recall, and precision were 72.73%, 100% and 57.14%. [Limitations] We did not extract the data automatically from different sources and the sample size was relatively small. [Conclusions] Sentiment analysis could help us identify rumors effectively.
作者
首欢容
邓淑卿
徐健
Shou Huanrong Deng Shuqing Xu Jian(School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China)
出处
《数据分析与知识发现》
CSSCI
CSCD
2017年第7期44-51,共8页
Data Analysis and Knowledge Discovery
基金
国家社会科学基金项目"用户评论情感分析及其在竞争情报服务中的应用研究"(项目编号:11CTQ022)
广东省科技专项"基于内容的科技文献分析服务平台"(项目编号:2016B030303003)的研究成果之一
关键词
情感分析
情感词典
谣言检测
谣言识别
Sentiment Analysis Sentiment Lexicon Rumor Identification Rumor Detection