摘要
通过分析大量的英文报道的特点,针对目前话题检测研究中存在的难以区分两次不同的火车事故或爆炸事件的问题提出了基于内容分析的话题检测算法.该算法以S ingle-Pass聚类策略为基础,通过内容分析将话题表示成两个中心向量:标识中心向量及内容中心向量.实验证明基于内容分析的话题检测算法不但简单易行,而且对于解决上述的“难以区分”问题非常有效.
Based on the analysis of lots of English stories, we propose a Content Analysis - based topic detection algorithm, which aims to solve the problem existing in the topic detection research, which is difficult to detect two distinct train or explosion accidents as different events. Based on Single-Pass clustering technique, using Content Analysis, the algorithm expresses topics as two centroids: identifier centroid and content centroid. Experiment results prove that Content Analysis-based topic detection algorithm is not only easy, but also effective on solving the dilficuh-to-distinguish problem.
出处
《哈尔滨工业大学学报》
EI
CAS
CSCD
北大核心
2006年第10期1740-1743,共4页
Journal of Harbin Institute of Technology
基金
国家自然科学基金资助项目(60302021)
国家863高科技项目基金资助项目(2004AA117010-08)
关键词
话题检测
内容分析
错误检测开销
标识词
内容词
topic detection
content analysis
detection error cost
identifier word
content word