摘要
为解决在海量数据中实时、精准挖掘网络舆情热点话题的问题,提出一种基于Hadoop的网络舆情数据分析模型。对于话题发现核心模块,给出一种WCGFMR网络舆情热点话题挖掘算法,采用Map(映射)和Reduce(规约)规则进行舆情文本特征分组加权策略。实验结果表明,采用基于Hadoop架构的WCGFMR算法进行热点话题挖掘,热点话题的平均召回率达到85.32%,平均话题类纯度达到95.36%。随舆情数据集增大到2GB后,在Map数一定的条件下,多任务数Reduce执行时间相比少任务数Reduce大大缩短,数据中热点话题挖掘速度显著提高。
A model for analyzing public opinion based on Hadoop was developed to solve the problem of the real-time accurate mining hot topic of network public opinion.For topic discovery core module,a WCGFMR algorithm for hot topic mining was given,using a weighted opinion text feature grouping strategy based on Map(mapping)and Reduce(Protocol)rules.Experiments show that the average recall of hot topic excavation reaches to 85.32%,the average pure of the topic cluster reaches to 95.36%,with the public opinion data set increases to 2GB,execution time of multi tasks was much less than small amount of tasks in the certain Map tasks,hot topic mining in large data speed significantly enhanced.
出处
《河北北方学院学报(自然科学版)》
2014年第6期19-24,共6页
Journal of Hebei North University:Natural Science Edition
基金
湖南省科技计划项目(2013GK3088)
公安部科技创新项目(2013YYCXHNST035)
湖南省教改项目(2014)
湖南省哲学社会科学基金项目(11YBA123)
湖南警察学院科研课题(2011YB01)
湖南省教育厅科研项目(13C281)