期刊文献+

基于多种词特征的微博突发事件检测方法 被引量:5

Microblog Bursty Events Detection Method Based on Multiple Word Features
下载PDF
导出
摘要 近年来,各领域内频频发生各类突发事件,对社会稳定发展产生了一定程度的影响.本文提出了一种基于多种词特征的微博突发事件检测模型,可以在海量微博数据中对突发事件进行检测,便于相关决策者进行微博监控和舆论引导,尽可能减少突发事件给社会带来的危害.首先根据时间信息对微博数据进行时间切片,对每一个时间窗口内的数据分别计算各个词语的词频特征、话题标签特征和词频增长率特征;然后基于D-S证据理论和层次分析法,确定词的各个特征权重,并进行加权融合得到词的突发特征值,将突发特征值大的词挑选出来构成突发特征词集,构建基于共现度和结合紧密度的突发事件特征词集的耦合度矩阵;最后将该耦合度矩阵作为凝聚式层次聚类算法的输入,生成一棵由突发词为叶子节点的二叉树,并采用内部相似度的二叉树剪枝算法对聚类结果进行划分,即可实现对相应时间窗口突发事件的检测.实验结果表明,基于突发词的事件检测模型在簇内部相似度阈值等于1.1时效果最好,正确率达到0.8462、召回率达到0.8684、F值为0.8571,表明了本文所提方法的有效性. In recent years,a wide variety of bursty events have been occurring frequently in many fields,impacting both the stability and the development of our society.This paper proposes an event detection model based on multiple word features,which is intended to detect bursty events in the massive microblog data.The model will assist decision-makers to monitor microblogs and guide public opinions and will minimize the negative effect of bursty events to society.Firstly,the model slices the microblog data according to the time information.In each time window,the word frequency feature,the topic tag feature and the word frequency growth rate feature of each word are calculated separately.Then,the D-S evidence theory and the analytic hierarchy process are utilized to determine each word’s feature weights,which are then merged to obtain the bursty feature value of the word.Words with large bursty feature value are selected to form the bursty feature word set and to construct a coupling degree matrix of bursty feature word set based on co-occurrence degree and tightness.Finally,the coupling degree matrix is used as the input of the hierarchical agglomerative clustering algorithm to generate a binary tree with bursty words being leaf nodes,and the internal similarity binary tree pruning algorithm is used to divide the clustering results.In this way,the detection of the corresponding time window’s bursty events can be realized.The experimental results show that the event detection model based on bursty words has the best effect when the intra-cluster similarity threshold is 1.1,the correct rate is as high as 0.8462,the recall rate reaches 0.8684,and the F value is 0.8571,indicating the effectiveness of the proposed method.
作者 张仰森 段宇翔 王建 吴云芳 ZHANG Yang-sen;DUAN Yu-xiang;WANG Jian;WU Yun-fang(Institute of Intelligent Information Processing,Beijing Information Science and Technology University,Beijing 100101,China;Institute of Computational Linguistics,Peking University,Beijing,100871,China;Beijing Laboratory of National Economic Security Early-warning Engineering,Beijing 100044,China)
出处 《电子学报》 EI CAS CSCD 北大核心 2019年第9期1919-1928,共10页 Acta Electronica Sinica
基金 国家自然科学基金(No.61772081) 科技创新服务能力建设-科研基地建设-北京实验室-国家经济安全预警工程北京实验室项目(No.PXM2018-014224-000010)
关键词 微博 突发事件 突发特征词 D-S证据理论 凝聚式层次聚类 microblog bursty events bursty feature words D-S evidence theory hierarchical agglomerative clustering
  • 相关文献

参考文献5

二级参考文献54

  • 1贾自艳,何清,张海俊,李嘉佑,史忠植.一种基于动态进化模型的事件探测和追踪算法[J].计算机研究与发展,2004,41(7):1273-1280. 被引量:58
  • 2陆安生,陈永强,屠浩文.决策树C5算法的分析与应用[J].电脑知识与技术(技术论坛),2005(3):17-20. 被引量:16
  • 3徐建民,唐万生.基于查询术语同义词的扩展信念网络检索模型[J].计算机工程,2007,33(10):28-30. 被引量:4
  • 4Allan J,Lavrenko V,Hubert J.First Story Detection in TDT isHard[C]//Proc.of the 9th International Conference on Informationand Knowledge Management.New York,USA:[s.n.],2000. 被引量:1
  • 5刘群,李素建.基于《知网》的词汇语义相似度的计算[EB/OL].(2010-11-21).http://download.csdn.net/detail/c183662101/1668191. 被引量:1
  • 6National Institute of Standards and Technology.The 2003 TopicDetection and Tracking Task Definition and EvaluationPlan[EB/OL].(2010-11-21).http://www.nist.gov/speech/tests/tdt/tdt2003/evalplan.htm. 被引量:1
  • 7中国互联网信息中心.第30次中国互联网络发展状况统计报告[R].2012. 被引量:8
  • 8SEMIOCAST, Geolocation analysis of Twitter accounts and tweets bySemiocast[ EB/OL]. [ 2012- 07- 30]. http: //semiocast. com/publi-cations/20 12_07 _ 30 _ Twitter _ reaches _ half _ a _ billion _ accounts _140m」n_the_US. 被引量:1
  • 9KWAK H,LEE C, PARK H, et al. What is Twitter, a social net-work or a news media? [ C]// Proceedings of the 19th InternationalConference on World Wide Web. New York: ACM, 2012: 591 -600. 被引量:1
  • 10DIAO Q M, JIANG J,ZHU F D. Finding Bursty topics fromMicroblogs[ C]// Proceedings of the 50th Annual Meeting of theAssociation for Computational Linguistics. Stroudsburg : Associationfor Computational Linguistics,2012:536 -544. 被引量:1

共引文献79

同被引文献77

引证文献5

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部