期刊文献+

基于内容分析的话题检测研究 被引量:20

Topic detection research based on content analysis
下载PDF
导出
摘要 通过分析大量的英文报道的特点,针对目前话题检测研究中存在的难以区分两次不同的火车事故或爆炸事件的问题提出了基于内容分析的话题检测算法.该算法以S ingle-Pass聚类策略为基础,通过内容分析将话题表示成两个中心向量:标识中心向量及内容中心向量.实验证明基于内容分析的话题检测算法不但简单易行,而且对于解决上述的“难以区分”问题非常有效. Based on the analysis of lots of English stories, we propose a Content Analysis - based topic detection algorithm, which aims to solve the problem existing in the topic detection research, which is difficult to detect two distinct train or explosion accidents as different events. Based on Single-Pass clustering technique, using Content Analysis, the algorithm expresses topics as two centroids: identifier centroid and content centroid. Experiment results prove that Content Analysis-based topic detection algorithm is not only easy, but also effective on solving the dilficuh-to-distinguish problem.
出处 《哈尔滨工业大学学报》 EI CAS CSCD 北大核心 2006年第10期1740-1743,共4页 Journal of Harbin Institute of Technology
基金 国家自然科学基金资助项目(60302021) 国家863高科技项目基金资助项目(2004AA117010-08)
关键词 话题检测 内容分析 错误检测开销 标识词 内容词 topic detection content analysis detection error cost identifier word content word
  • 相关文献

参考文献7

  • 1ALLAN J,CARBONELL J.Topic Detection and Tracking Pilot Study:Final Report[A].Proceeding of the DARPA Broadcast News Transcriptions and Understanding Workshop[C].1998. 被引量:1
  • 2ALLAN J,LAVENKO V.UMass at TDT 2000.Available at http://www.nist.gov/speech/tests/tdt/tdt2000/papers.htm,2000. 被引量:1
  • 3WALLS F,JIN H,SISTA S,.et al.Topic Detection in Broadcast News[A].Proceedings of the DARPA Broadcast News Workshop[C].Herndon,1999. 被引量:1
  • 4贾自艳,何清,张海俊,李嘉佑,史忠植.一种基于动态进化模型的事件探测和追踪算法[J].计算机研究与发展,2004,41(7):1273-1280. 被引量:58
  • 5MAKKONEN J,AHONEN-MYKA H,SALMENKIVI M.Applying Semantic Classes in Event Detection and Tracking[A].Proceedings of International Conference on Natural Language Processing[C].Mumbai,India,2002. 被引量:1
  • 6STRASSEL S,GRAFF D,MARTEY N.Quality Control in Large Annotation Projects Involving Multiple Judges:The Case of the TDT Corpora[A].Proceedings of the Second International Language Resources and Evaluation Conference[C].Athens,Greece,2000. 被引量:1
  • 7The 2003 Topic Detection and Tracking (TDT2003)Task Definition and Evaluation Plan.Available at http://www.nist.gov/speech/tests/tdt/tdt2003/evalplan.htm,April,2003. 被引量:1

二级参考文献7

  • 1R Papka.On-line new event detection,clustering,and tracking:[Ph D dissertation].MA:University of Massachusetts Amherst,1999 被引量:1
  • 2K Hui,W Lam.Automatic event generation from multi-lingual news stories.In:Proc of the First ACM/IEEE-CS Joint Conf on Digital Libraries.Roanoke,New York:ACM Press,2001.23~24 被引量:1
  • 3N Stokes,J Carthy,A F Smeaton.Segmenting broadcast news streams using lexical chaining.In:T Vidal,P Liberatore,eds.Proc of STAIRS 2002.Amsterdam:IOS Press,2002.145~154 被引量:1
  • 4D Randall.The Universal Journalist,Second Edition.London:Pluto Press,2000 被引量:1
  • 5S H Lin,M C Chen,J M Ho,et al.ACIRD:Intelligent Internet document organization and retrieval.IEEE Trans on Knowledge and Data Engineering,2002,14(3):599~613 被引量:1
  • 6G Salton,B Buckley.Term-weighting approaches in automatic text retrieval.Information Processing and Management,1998,24(5):513~523 被引量:1
  • 7李晓黎,刘继敏,史忠植.基于支持向量机与无监督聚类相结合的中文网页分类器[J].计算机学报,2001,24(1):62-68. 被引量:108

共引文献57

同被引文献253

引证文献20

二级引证文献220

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部