摘要
[目的/意义]关键词抽取技术可以帮助用户从海量文本中快速定位核心内容,对情报收集工作有着重要意义。目前,关键词抽取主要依靠词频和共现关系,忽视了知识库对关键词抽取的指导作用。[方法/过程]本文提供了一种融合知识的关键词抽取方法,首先基于义原和词林构建词汇知识图谱,其次结合词语的共现关系,生成新的概率转移矩阵,最后实现关键词抽取。[结果/结论]基于海量摘要数据集的实验表明,融合知识的关键词抽取方法,能有效提高现有关键词抽取方法的性能。
[Objective/Significance]Keyword extraction technology can help users quickly locate core content from massive short texts,which is of great significance to intelligence collection.At present,keyword extraction mainly relies on word frequency and co-occurrence relationship,ignoring the guiding role of the knowledge base in keyword extraction.[Methods/Process]This article provides a method of keyword extraction that integrates knowledge.First,build a vocabulary knowledge graph based on the original meaning and the word forest,and then combine the co-occurrence relationship of the words to generate a new probability transition matrix,and finally realize the keyword extraction.[Results/Conclusions]Experiments based on massive abstract data sets show that the keyword extraction method based on fusion knowledge can effectively improve the performance of existing keyword extraction methods.
作者
段建勇
鲁朝阳
王昊
李欣
何丽
DUAN Jianyong;LU Zhaoyang;WANG Hao;LI Xin;HE Li(School of information,North China University of Technology,Beijing 100144,China;The Key Laboratory of Rich-Media Knowledge Organization and Service of Digital Publishing Content,Beijing 100036,China;CNONIX National Standard and Promotion Laboratory,North China University of Technology,Beijing 100144,China)
出处
《情报工程》
2022年第3期3-12,共10页
Technology Intelligence Engineering
基金
国家自然科学基金项目“基于多源特征学习的中文查询纠错方法研究”(61672040)
“面向新闻事件的查询时效性计算模型研究”(61972003)
富媒体数字出版内容组织与知识服务重点实验室开放基金“垂直领域知识图谱构建关键词技术研究”(ZD2021-11/05)
北京市教育委员会科学研究计划项目资助(KM202210009002)。
关键词
关键词抽取
融合知识
义原
词林
Keyword extraction
fusion of knowledge
sememe
cilin