相似度质心多层过滤策略的动态文摘方法

The similarity centroid multilayer filtering dynamic summarization method

下载PDF

导出

摘要为了研究网络快速有效获取信息的方法,网络动态演化内容的识别和分析成为人们迫切需要解决的关键问题。动态多文档文摘建立在时间信息基础上,从网络数据的动态性能入手,对同一主题不同时段的文摘集合进行分析,在识别信息内容差异性的基础上,对信息的动态演化性进行建模。在提出相似度累加模型基础上,进一步提出了基于质心整体选优的动态文摘模型。分析当前文档集合与历史集合强关联性,以选择出的不同文摘句为首句生成候选文摘集合,然后根据质心多层过滤优选方法从中选出最优文摘结果。这种模型方法消除了因首句选择不当而对文摘性能造成的影响,在国际标准评测Taxt Anynasis Conference 2008的Update task任务语料上进行了测试,并且获得了较好的实验结果。 To research the method for quickly obtaining effective information on the internet,identifying and analyzing dynamic evolution of the network has become a key issue that needs to be resolved urgently. Dynamic multidocument summarization is based on the time information starts from dynamic performance analysis of network data,the analyzes the abstracts collect about the same topic in different periods of time,and the models of dynamic evolution of information on the basis of identifying differences of information contents. This paper first introduced the text similarity cumulative model and then the dynamic summarization model based on centroid integer selection. The high relevance between the current collection of documents and the historical collection was analyzed and different sentences summaries were selected and used as the first sentences of candidate set of abstracts newly generated.Next,the best abstracts were selected from the results based on the centroid multilayer filtering optimization method. These models eliminate the impact on the abstract performance due to poor choice of the first sentences. Experiments on the update task corpus from the Taxt Anynasis Conference 2008（ TAC2008） were conducted and the comparison results between new models and TAC2008 evaluation showed the effectiveness of the dynamic summarization models.

作者于洋范文义刘美玲王慧强

机构地区东北林业大学林学院哈尔滨工程大学计算机科学与技术学院

出处《哈尔滨工程大学学报》 EI CAS CSCD 北大核心 2014年第10期1236-1241,共6页 Journal of Harbin Engineering University

基金国家863计划资助项目(2012AA102001) 国家自然科学基金资助项目(60736014) 中央高校基本科研业务费专项资金资助项目(2572014CB26)

关键词动态文摘模型质心整体选优相似度累加模型 dynamic summarization model centroid integer selection similarity cumulative model

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献9

1LIN C Y, HOVY E. Automatic evaluation of summaries u- sing n-gram co-occurrence statistics [ C ]//Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Lan- guage Technology. Morristown, USA, 2003:71-78. 被引量：1
2WAN Xiaojun. Using bilingual information for cross-lan- guage document summarization [ C ]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Portland, USA, 2011: 1546-1555. 被引量：1
3张瑾,许洪波,程学旗.面向网络演化信息的动态文摘方法研究[J].计算机学报,2008,31(4):696-701. 被引量：8
4LIN C Y. Rouge : a package for automatic evaluation of sum- maries[ C ]//Text Summarization Branches Out: Proceed- ings of the ACL-04 Workshop. Barcelona, Spain, 2004 : 74 -81. 被引量：1
5HOVY E, LIN C Y, ZHOU L, et al. Automated summariza- tion evaluation with basic elements[ C ]//Proceedings of the Fifth Conference on Language Resources and Evaluation (LREC 2006). Genoa, Italy, 2006: 604-611. 被引量：1
6NENKOVA A, PASSONNEAU R, MCKEOWN K. The pyr- amid method: incorporating human content selection varia- tion in summarization evaluation [ J ]. ACM Transactions on Speech and Language Processing (TSLP) , 2007, 4(2) : 1- 8. 被引量：1
7GOLDSTEIN J, MITYAL V, CARBONELL J, et al. Crea- ting and evaluating multi-document sentence extract summa- ries [ C ]//Proceedings of The Ninth International Confer- ence on Information and Knowledge Management. Washing- ton DC, USA, 2000: 165-172. 被引量：1
8刘美玲,任洪娥,于洋,郑德权,赵铁军.基于网络的动态多文档文摘系统框架[J].软件学报,2013,24(5):1006-1021. 被引量：3
9刘美玲,郑德权,赵铁军,于洋.动态多文档文摘模型[J].软件学报,2012,23(2):289-298. 被引量：9

二级参考文献27

1http://projects.ldc.upenn.edu/ace/intro.html. 被引量：1
2Mani I. Automatic Summarization. John Benjarnins Publishing Company, 2001. 被引量：1
3Zhang S, Zhao TJ, Yu H, Zhao H. The research on the influence of the types of document sets on multi-document summarization. Journal of Computational Information Systems, 2007,3(3):1201-1206. 被引量：1
4Dang HT, Owczarzak K. Overview of the TAC 2008 Update Summarization Task. In: Proc. of the Text Analysis Conf. 2008. 被引量：1
5Allan J, Jin H, Rajman M, Wayne C, Gildea D, Lavrenko V, Hoberman R, Caputo D. Topic-Based novelty detection. Technical Report, ws99, Baltimore: Center for Language and Speech Processing, Johns Hopkins University, 1999. 被引量：1
6Allan J, Papka R, Lavrenko V. On-Line new event detection and tracking. In: Proc. of the 21st Annual Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval. Melbourne, 1998.37-45. [doi: 10.1145/290941.290954]. 被引量：1
7Mani I. Recent developments in temporal information extraction. In: Nicolov N, Mitkov R, eds. Proc. of the RANLP. 2004. 被引量：1
8Makkonen J. Investigations on event evolution in TDT. In: Proc. of the Student Workshop of Human Language Technology Conf. of the North American Chapter of the Association for Computational Linguistics. Edmonton, 2003. 43-48. Idol: 10.3115/1073416. 1073424]. 被引量：1
9Mani I, Wilson G. Robust temporal processing of news. In: Proc. of the 38th Annual Meeting on Association for Computational Linguistics. Hong Kong, 2000. 69-76. [doi: 10.3115/1075218:1075228]. 被引量：1
10Lin CY, Hovy E. Automatic evaluation of summaries using N-gram cooccurrence statistics. In: Proc. of the 2003 Conf. of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL 2003). Morristown: Association for Computational Linguistics, 2003.71-78. [doi: 10.3115/1073445.1073465]. 被引量：1

共引文献14

1何忠育,王勇,王瑛,陈新,廖朝辉.基于分布式计算的网络舆情分析系统的设计[J].警察技术,2010(3):19-22. 被引量：6
2刘金岭,倪晓红,王新功.手机短信文本信息流的自动文摘生成[J].现代图书情报技术,2013(2):43-49. 被引量：4
3刘美玲,任洪娥,于洋,郑德权,赵铁军.基于网络的动态多文档文摘系统框架[J].软件学报,2013,24(5):1006-1021. 被引量：3
4向剑平,左劼,乔少杰,郑皎凌,胡剑.网络舆情态势分析研究[J].四川大学学报（自然科学版）,2013,50(5):985-990. 被引量：5
5刘德喜,万常选.社会化短文本自动摘要研究综述[J].小型微型计算机系统,2013,34(12):2764-2771. 被引量：12
6石晓亮.基于多关键字匹配算法的巡检视频评价系统的研究[J].网络安全技术与应用,2014(1):53-54. 被引量：1
7郭海蓉,张晖,赵旭剑,李波,杨春明.一种基于改进K-means的动态文摘提取方法[J].软件导刊,2015,14(5):77-79. 被引量：2
8宋俊,韩啸宇,黄宇,黄廷磊,付琨.一种面向实体的演化式多文档摘要生成方法[J].广西师范大学学报（自然科学版）,2015,33(2):36-41. 被引量：2
9王俊丽,魏绍臣,管敏.基于图排序算法的自动文摘研究综述[J].计算机科学,2015,42(12):1-7. 被引量：12
10郭海蓉,张晖,赵旭剑,李波,杨春明.基于增量图聚类的动态多文档摘要算法[J].计算机应用研究,2016,33(7):2034-2038. 被引量：2

1刘美玲,郑德权,赵铁军,于洋.动态多文档文摘模型[J].软件学报,2012,23(2):289-298. 被引量：9
2张瑾,许洪波,程学旗.面向网络演化信息的动态文摘方法研究[J].计算机学报,2008,31(4):696-701. 被引量：8
3郭海蓉,张晖,赵旭剑,李波,杨春明.一种基于改进K-means的动态文摘提取方法[J].软件导刊,2015,14(5):77-79. 被引量：2
4王志军.利用多层过滤智能管理QQ邮箱[J].电脑迷,2012(3):71-71.
5郭海蓉,张晖,赵旭剑,李波,杨春明.基于增量图聚类的动态多文档摘要算法[J].计算机应用研究,2016,33(7):2034-2038. 被引量：2
6高觐悦,张功萱.基于UDDI的语义Web服务匹配算法的研究[J].信息化研究,2009,35(10):45-47. 被引量：2
7郭庆琳,樊孝忠,柳长安.基于文本聚类和NLU的自动文摘研究[J].北京理工大学学报,2005,25(8):705-709. 被引量：1
8周玉,宗成庆,徐波.基于多层过滤的统计机器翻译[J].中文信息学报,2005,19(3):54-60. 被引量：3
9杨晓兰,钟义信.基于文本理解的自动文摘系统研究与实现[J].电子学报,1998,26(7):155-158. 被引量：17
10奚建荣.基于综合过滤技术的邮件过滤终端研究[J].计算机应用与软件,2011,28(6):186-188. 被引量：3

哈尔滨工程大学学报

2014年第10期

浏览历史

内容加载中请稍等...

相似度质心多层过滤策略的动态文摘方法

参考文献9

二级参考文献27

共引文献14

相关作者

相关机构

相关主题

浏览历史