摘要
【目的】针对词主题信息与词相似性信息对关键词提取的影响进行了研究,提出一种改进的TextRank关键词提取方法。【方法】首先,使用隐含狄利克雷分布(Latent Dirichlet allocation,LDA)主题模型对文档建模计算词主题信息;其次,使用FastText生成词向量,并计算词相似性矩阵;最后,融合词主题信息与词相似性信息的综合权重来优化TextRank词汇节点的初始权重,并进行词图模型的迭代运算与关键词提取。【结果】实验表明,改进方法的提取结果优于传统方法。【结论】证明了考虑词主题信息的全局性与词相似性信息的局部性能有效提高TextRank算法提取关键词的性能。
[Purposes]Aiming at the influence of word topic and word similarity on keyword extraction,an improved TextRank keyword extraction method is proposed.[Methods]First,by using Latent Dirichlet Allocation(Latent Dirichlet Allocation,LDA)word theme topic influence model to calculate the document model.Secondly,by employing FastText to generate word vectors and calculate word similarity matrices.Finally,by integrating the weight of word theme influence and word similarity influence to optimize the initial weight of vocabulary node in TextRank,iterative operation and keyword extraction of word graph model.[Findings]Experiments show that the extraction result of the improved method is better than the traditional method.[Conclusions]It is proved that the global influence of word topic and the local influence of word similarity can effectively improve the performance of TextRank algorithm in extracting keywords.
作者
王涛
李明
WANG Tao;LI Ming(School of Computer and Information Sciences,Chongqing Normal University,Chongqing 401331,China)
出处
《重庆师范大学学报(自然科学版)》
CAS
北大核心
2019年第3期98-104,共7页
Journal of Chongqing Normal University:Natural Science
基金
重庆市教育委员会教改项目(No.092055)
重庆市教育委员会科技项目(No.kj098820)