摘要
自动文本摘要生成是自然语言处理领域中颇具挑战性的问题之一,其任务是为书籍、篇章、新闻或者微博等某一文本资源生成简洁而又具有意义的文本摘要。TextRank算法是一种基于图的文本摘要生成算法,只利用当前文档即可实现关键词提取和文摘生成,因其简洁有效而得到广泛应用。本文在TextRank算法的基础上提出一个无监督抽取式联合打分模型。一方面,结合词频逆句频余弦相似度与词向量余弦相似度共同计算句子得分;另一方面,采用最大边缘相关度算法(Maximal Marginal Relevance,MMR)将抽取得到的摘要去除冗余。实验表明,改进后的方法生成的摘要具有更高的质量,尤其具有更好的梗概性和多样性。
Automatic text summarization,whose task is to generate a concise and meaningful summary for a book,an article,news or a piece of microblog,is one of the most challenging problems in the field of Natural Language Processing.TextRank is a widely used,concise and efficient graph-based text summarization algorithm which can extract keywords or generate a summary only depending on the target document itself.Based on the TextRank algorithm,an unsupervised and extractive joint scoring model for automatic text summarization problem is proposed.On the one hand,it evaluates the sentences via combining the word frequency inverse sentence frequency cosine similarity and the word vector cosine similarity;on the other hand,it exploits the Maximal Marginal Relevance(MMR)to remove the redundancy of the extracted abstract.Experimental results indicate that abstracts generated by the proposed method have higher quality,especially in the terms of generality and diversity.
作者
朱玉佳
祝永志
董兆安
ZHU Yujia;ZHU Yongzhi;DONG Zhaoan(Qufu Normal University,Rizhao Shandong 276826,China)
出处
《通信技术》
2021年第2期323-326,共4页
Communications Technology
基金
山东省自然科学基金资助项目“基于众包的知识补全关键问题研究”(No.ZR2020MF149)。
关键词
文本摘要生成
TextRank
词频逆句频余弦相似度
最大边缘相关度
词向量
text summary generation
TextRank
word frequency inverse sentence frequency cosine similarity
maximal marginal relevance
word vector