期刊文献+

相似度质心多层过滤策略的动态文摘方法

The similarity centroid multilayer filtering dynamic summarization method
下载PDF
导出
摘要 为了研究网络快速有效获取信息的方法,网络动态演化内容的识别和分析成为人们迫切需要解决的关键问题。动态多文档文摘建立在时间信息基础上,从网络数据的动态性能入手,对同一主题不同时段的文摘集合进行分析,在识别信息内容差异性的基础上,对信息的动态演化性进行建模。在提出相似度累加模型基础上,进一步提出了基于质心整体选优的动态文摘模型。分析当前文档集合与历史集合强关联性,以选择出的不同文摘句为首句生成候选文摘集合,然后根据质心多层过滤优选方法从中选出最优文摘结果。这种模型方法消除了因首句选择不当而对文摘性能造成的影响,在国际标准评测Taxt Anynasis Conference 2008的Update task任务语料上进行了测试,并且获得了较好的实验结果。 To research the method for quickly obtaining effective information on the internet,identifying and analyzing dynamic evolution of the network has become a key issue that needs to be resolved urgently. Dynamic multidocument summarization is based on the time information starts from dynamic performance analysis of network data,the analyzes the abstracts collect about the same topic in different periods of time,and the models of dynamic evolution of information on the basis of identifying differences of information contents. This paper first introduced the text similarity cumulative model and then the dynamic summarization model based on centroid integer selection. The high relevance between the current collection of documents and the historical collection was analyzed and different sentences summaries were selected and used as the first sentences of candidate set of abstracts newly generated.Next,the best abstracts were selected from the results based on the centroid multilayer filtering optimization method. These models eliminate the impact on the abstract performance due to poor choice of the first sentences. Experiments on the update task corpus from the Taxt Anynasis Conference 2008( TAC2008) were conducted and the comparison results between new models and TAC2008 evaluation showed the effectiveness of the dynamic summarization models.
出处 《哈尔滨工程大学学报》 EI CAS CSCD 北大核心 2014年第10期1236-1241,共6页 Journal of Harbin Engineering University
基金 国家863计划资助项目(2012AA102001) 国家自然科学基金资助项目(60736014) 中央高校基本科研业务费专项资金资助项目(2572014CB26)
关键词 动态文摘模型 质心整体选优 相似度累加模型 dynamic summarization model centroid integer selection similarity cumulative model
  • 相关文献

参考文献9

  • 1LIN C Y, HOVY E. Automatic evaluation of summaries u- sing n-gram co-occurrence statistics [ C ]//Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Lan- guage Technology. Morristown, USA, 2003:71-78. 被引量:1
  • 2WAN Xiaojun. Using bilingual information for cross-lan- guage document summarization [ C ]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Portland, USA, 2011: 1546-1555. 被引量:1
  • 3张瑾,许洪波,程学旗.面向网络演化信息的动态文摘方法研究[J].计算机学报,2008,31(4):696-701. 被引量:8
  • 4LIN C Y. Rouge : a package for automatic evaluation of sum- maries[ C ]//Text Summarization Branches Out: Proceed- ings of the ACL-04 Workshop. Barcelona, Spain, 2004 : 74 -81. 被引量:1
  • 5HOVY E, LIN C Y, ZHOU L, et al. Automated summariza- tion evaluation with basic elements[ C ]//Proceedings of the Fifth Conference on Language Resources and Evaluation (LREC 2006). Genoa, Italy, 2006: 604-611. 被引量:1
  • 6NENKOVA A, PASSONNEAU R, MCKEOWN K. The pyr- amid method: incorporating human content selection varia- tion in summarization evaluation [ J ]. ACM Transactions on Speech and Language Processing (TSLP) , 2007, 4(2) : 1- 8. 被引量:1
  • 7GOLDSTEIN J, MITYAL V, CARBONELL J, et al. Crea- ting and evaluating multi-document sentence extract summa- ries [ C ]//Proceedings of The Ninth International Confer- ence on Information and Knowledge Management. Washing- ton DC, USA, 2000: 165-172. 被引量:1
  • 8刘美玲,任洪娥,于洋,郑德权,赵铁军.基于网络的动态多文档文摘系统框架[J].软件学报,2013,24(5):1006-1021. 被引量:3
  • 9刘美玲,郑德权,赵铁军,于洋.动态多文档文摘模型[J].软件学报,2012,23(2):289-298. 被引量:9

二级参考文献27

  • 1http://projects.ldc.upenn.edu/ace/intro.html. 被引量:1
  • 2Mani I. Automatic Summarization. John Benjarnins Publishing Company, 2001. 被引量:1
  • 3Zhang S, Zhao TJ, Yu H, Zhao H. The research on the influence of the types of document sets on multi-document summarization. Journal of Computational Information Systems, 2007,3(3):1201-1206. 被引量:1
  • 4Dang HT, Owczarzak K. Overview of the TAC 2008 Update Summarization Task. In: Proc. of the Text Analysis Conf. 2008. 被引量:1
  • 5Allan J, Jin H, Rajman M, Wayne C, Gildea D, Lavrenko V, Hoberman R, Caputo D. Topic-Based novelty detection. Technical Report, ws99, Baltimore: Center for Language and Speech Processing, Johns Hopkins University, 1999. 被引量:1
  • 6Allan J, Papka R, Lavrenko V. On-Line new event detection and tracking. In: Proc. of the 21st Annual Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval. Melbourne, 1998.37-45. [doi: 10.1145/290941.290954]. 被引量:1
  • 7Mani I. Recent developments in temporal information extraction. In: Nicolov N, Mitkov R, eds. Proc. of the RANLP. 2004. 被引量:1
  • 8Makkonen J. Investigations on event evolution in TDT. In: Proc. of the Student Workshop of Human Language Technology Conf. of the North American Chapter of the Association for Computational Linguistics. Edmonton, 2003. 43-48. Idol: 10.3115/1073416. 1073424]. 被引量:1
  • 9Mani I, Wilson G. Robust temporal processing of news. In: Proc. of the 38th Annual Meeting on Association for Computational Linguistics. Hong Kong, 2000. 69-76. [doi: 10.3115/1075218:1075228]. 被引量:1
  • 10Lin CY, Hovy E. Automatic evaluation of summaries using N-gram cooccurrence statistics. In: Proc. of the 2003 Conf. of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL 2003). Morristown: Association for Computational Linguistics, 2003.71-78. [doi: 10.3115/1073445.1073465]. 被引量:1

共引文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部