摘要
从网络信息的动态演化性出发,对同一话题不同时序阶段的文档集合进行识别和分析,在度量演化内容差异性的基础上实现动态性,给出了两种实现动态多文档文摘的模型,即基于矩阵子空间分析和基于文本相似度累加的动态多文档文摘模型.在此基础上,提出了高效的动态句子加权方法.TAC 2008的Update Summarization测试数据上的实验证明了所提出的动态多文档文摘模型的有效性.
This paper introduces two models to describe dynamic evolution of network information: identify and analysis the document collection on the same topic in different stages. In order to construct dynamic of evolution content differences, two dynamic multi-document summarization models are presented, which are matrix subspace analysis model, text similarity cumulative model. Based on these models, some efficient dynamic sentence weighting algorithms are implemented. Experiments on the test data of Update Summarization in TAC 2008 and comparative results between new models and TAC 2008 evaluation, shows the effectiveness of the models.
出处
《软件学报》
EI
CSCD
北大核心
2012年第2期289-298,共10页
Journal of Software
基金
国家自然科学基金(60736014
60773069
61073130)
国家高技术研究发展计划(863)(2006AA010108)
关键词
多文档文摘
差异性分析
矩阵模型
相似度累加
动态演化
multi-document summarization
otherness analysis
matrix model
similarity cumulative
dynamic evolvement