期刊文献+

融合多特征的TextRank关键词抽取方法 被引量:33

TextRank Keyword Extraction Based on Multi Feature Fusion
下载PDF
导出
摘要 [目的/意义]关键词提取在自然语言处理领域有着广泛的应用,如何快速准确地实现关键词的提取已经成为文本处理的关键问题。目前关键词提取方法非常多,但准确率仍有待提升。为此,提出一种结合单一文档内部结构信息、词语对于单文档和文档集整体的重要性的关键词抽取方法。[方法/过程]首先,根据词语的平均信息熵特征计算词语对文档集整体的重要性,利用词语的词性、位置特征计算词语对单文档中的重要性。然后,通过神经网络训练的方式优化三个特征的权重分配实现特征的融合。最后,利用三个特征计算得到词语的综合权值来改进TextRank模型词汇节点的初始权重以及概率转移矩阵,再通过迭代法实现关键词的抽取。[结果 /结论]该研究方法结合了文档集整体信息和单文档自身信息,其关键词提取的准确率较传统TextRank方法、TFIDF-TextRank方法有了明显的提高。 [ Purpose/Significance] Keyword extraction has a wide range of application in the field of natural language processing, and how to extract keyword quickly and accurately has become a critical problem in text processing. At present, there are many ways to extract keywords, but the accuracy rate still needs to be improved. In this study, a keyword extraction method combined with the single document internal structure information and the importance of word to single document and document set is proposed. [ Method/Process] First, the importance of word to the whole document set is calculated based on the average information entropy of word; the importance of word in a single document is calculated according to the part of speech and location of word. Then, a neural network training method is used to optimize the weight allocation of the three features to achieve the fusion of features. Finally, the three features are used to compute the com- prebensive weights of the word to improve the initial weights of the lexical nodes and the probability transfer matrix of the TextRank model, and then the keywords are extracted by iterative method. [ Result/Conclusion] The method proposed combines the information of the whole document set and the single document itself, and the accuracy of the keyword extraction is higher than that of the TextRank method and the TFIDF-TextRank method.
作者 李航 唐超兰 杨贤 沈婉婷 Li Hang Tang Chaolan Yang Xian Shen Wanting(School of Computer Science, Guangdong University of Technology, Guangzhou School of Art and Design, Guangdong University of Technology, Guangzhou 510006, 510075)
出处 《情报杂志》 CSSCI 北大核心 2017年第8期183-187,共5页 Journal of Intelligence
基金 广东省部产学研专项资金企业创新平台"面向家电行业的用户数据挖掘系统研究及体验式设计创新服务"(编号:2013B090800042)
关键词 TextRank算法 关键词抽取 神经网络 平均信息熵 textrank algorithm keyword extraction neural network average information entropy
  • 相关文献

参考文献6

二级参考文献43

共引文献192

同被引文献288

引证文献33

二级引证文献128

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部