摘要
[目的]利用向量空间描述语义信息,研究基于词向量包的自动文摘方法;[方法]文摘是文献内容缩短的精确表达;而词向量包可以在同一个向量空间下表示词、短语、句子、段落和篇章,其空间距离用于反映语义相似度。提出一种基于词向量包的自动文摘方法,用词向量包的表示距离衡量句子与整篇文献的语义相似度,将与文献语义相似的句子抽取出来最终形成文摘;[结果]在DUC01数据集上,实验结果表明,该方法能够生成高质量的文摘,结果明显优于其它方法;[结论]实验证明该方法明显提升了自动文摘的性能。
[Purposes] This work focused on automatic summarization by utilizing vector space to describe the semantics. [Methods] proposed a new representation based on word vector,which is called bag of word vector( BOWV),and employed it for automatic summarization. Words,phrases,sentences,paragraphs and documents could be represented in a same vector space by using BOWV. And the distance between representations was used to reflect the semantic similarity. For automatic summarization,the paper used the distance between BOWVs to measure the semantic similarity between sentences and document. The sentences similar with the document are extracted to form the summary. [Findings]Experimental results on DUC01 dataset showed that the proposed method could generate high- quality summary and outperforms comparison methods. [Conclusions] The experiment showed that this research improved the performance of automatic summarization significantly.
出处
《现代情报》
CSSCI
北大核心
2017年第2期8-13,共6页
Journal of Modern Information
基金
国家自然基金项目"基于领域本体的蒙古文数字资源整合机制研究"(项目编号:71163029)
关键词
词向量
词包向量
自动文摘
vector
bag of word vector
automatic summarization