摘要
针对新闻数据流的事件检测问题,提出了一种基于突发特征分析的事件检测方法。事件由在一定时间窗口内代表它的特征构成,通常它们在事件发生时表现出一定的突发。通过多尺度突发分析算法识别出突发特征,并计算突发特征突发模式的相似性及所在新闻的重合度,对突发特征进行聚类分析以构造事件。在路透社80多万篇新闻数据集中验证上述算法,可准确地识别出突发特征各种跨度上的突发,且能有效地检测出事件。
This paper proposed an event detection method based on analyzing bursty features in news streams. Event is a minimal set of bursty features that occur together in certain time window with strong support of documents in the text stream. Introduced an elastic burst detection algorithm to identify multi-scale bursty features. Then, used affinity propagation clustering algo- rithm to group these bursty features with high document overlap and identically distribution in bursty time windows together. Conducted experiments using real life data, the Reuters Corpus volume 1, with over 800 thousands news reports across one year. The proposed algorithm can accurately identify the multi-scale bursty features and detect the events efficiently.
出处
《计算机应用研究》
CSCD
北大核心
2011年第1期117-120,共4页
Application Research of Computers
基金
浙江省教育厅科研资助项目(Y200908583)
关键词
事件检测
特征轨迹
多尺度分析
突发特征
近邻传播聚类
event detection
feature trajectory
multi-scale analysis
bursty feature
affinity propagation clustering