期刊文献+

基于开放域抽取的多文档概念图构建研究 被引量:2

Multi-document conceptual graph construction research based on open domain extraction
下载PDF
导出
摘要 在信息过载的背景下,如何从拥有共同主题的多篇文档中挖掘并组织核心概念及其语义连接已成为当前信息抽取任务中的一项重要挑战。为此,提出了一种新颖的基于开放域抽取的多文档概念图构建方法。首先基于预定主题挖掘主题词,通过改进的TF-IDF算法对文档进行排序;然后通过共指消解、篇章权重计算、三元组实例抽取等一系列步骤从多篇文章中抽取出大量具有事实表达能力的三元组实例。为去除开放域方法本身的噪声以及提高信息抽取的准确率,提出一种三元组实例过滤算法。通过该算法可有效提取高置信度且具有良好语义兼容性的显著关系实例集合,并构成多个概念子图。最后,将不同子图中的等价概念以及关系进行合并,形成一张具有较好主题表达能力的连通概念图。通过在signal media新闻数据集上进行验证,实验结果表明,所提出的方法能够跨文档组织重要的主题信息,形成的概念图在主题概念覆盖率、关系实例的兼容性等指标上均取得了较好的效果。在实际的应用场景中,概念图作为一种重要的多文档内容表现形式,对于用户进一步探索指定主题的发展脉络以及生成自动文档摘要均具有重要的参考价值。 In the background of information overload,this is challenging to mine and organize meaningful concepts and their semantic connections from a set of related documents under the same topic in information extraction.Thus,this paper proposed a novel multi-document conceptual graph construction method based on open-domain information extraction.Firstly,documents were ranked according to the improved TF-IDF weight of extracted topic words under the predefined topics,then the method relayed on a serious of methods,including coreference resolution,weight computation,triple instance extraction steps,to extract numerous representative subject-predicate-object triples from multiple documents.For filtering out the noise of opendomain information approach itself and improving the accuracy of information extraction,this paper presented a triple filtering algorithm to retain only the most salient,confident and compatible triples,which can form multiple conceptual subgraphs.Finally,in combined with the equivalent concepts and relationships across different subgraphs to connect into a fully connected conceptual graph.Experiments on signal media dataset illustrate that the proposed method has the capacity to discern key topic information corresponds to the specific topic within and across documents,and the formed conceptual graph achieves the good performance in terms of the coverage rate of topic concepts as well as the compatible triples.In actual circumstance,conceptual graph can be regarded as an important representation form of multiple documents and has the important significance for further exploring advance of the topic and generating automatic document abstraction.
作者 盛泳潘 付雪峰 吴天星 Sheng Yongpan;Fu Xuefeng;Wu Tianxing(School of Computer Science&Engineering,University of Electronic Science&Technology of China,Chengdu 611731,China;School of Information Engineering,Nanchang Institute of Technology,Nanchang 330099,China;School of Computer Science&Engineering,Southeast University,Nanjing 211189,China)
出处 《计算机应用研究》 CSCD 北大核心 2020年第1期19-25,共7页 Application Research of Computers
基金 国家自然科学基金资助项目(61762063) 江西省自然科学基金资助项目(20171BAB202024) 江西省教育厅科研项目(GJJ170991) 国家建设高水平大学公派研究生项目(201706070049).
关键词 开放域抽取 多文档 概念图构建 open-domain extraction multiple documents conceptual graph construction
  • 相关文献

参考文献5

二级参考文献90

共引文献81

同被引文献18

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部