摘要
[目的/意义]科技文献知识图谱对文献语义检索、学术精准推荐、学科智能问答等创新型知识服务具有重要的支撑作用。然而图谱中大量实体缺失链接关系,阻碍了知识服务的升级与改革。翻译模型是知识图谱关系预测的主流方法,但是典型的翻译模型在动态表示、属性区分和文本特征融合等方面能力不足,难以直接应用于科技文献知识图谱关系预测任务中。[方法/过程]文章提出一种改进的翻译模型CoTransH,实现科技文献知识图谱的语义关系预测。数据准备层:先综合语步识别、实体抽取、语义相似性度量等技术自动构建关系预测的标注语料库,再融合文本特征和外部先验知识动态生成向量,增强模型在开放世界中的语义表示学习能力;模型结构层:先引入超平面机制解决多对多关系预测,后加入非线性卷积层区分头尾实体属性,再改进得分函数提高关系的关注度,最后根据语料特征改进负例生成策略,提升模型对关系预测精度。[结果/结论]使用CoTransH模型构建了以人工智能领域科技文献摘要蕴含的"问题"短语和"方法"短语为节点,"采用"和"解决"关系为边的人工智能领域知识图谱。CoTransH的关系预测F1值,在封闭世界下比典型的翻译模型(TransE,TransH,TransD,KG2E)平均提升12.1%,在开放世界下平均高于TransH模型38.46%。CoTransH可融合实体语义特征和几何特征,实现高效的科技文献知识图谱关系补全。[局限]提出的CoTransH模型尚缺多义关系预测的能力。
[Purpose/significance] The knowledge graph of scientific articles plays an important role in supporting innovative knowledge services such as semantic literature retrieval, accurate academic recommendation and intelligent subject question-answering.However, the lack of links between a large number of entities in the graphhinders the upgrading and reform of knowledge services.Translation model is the mainstream method of knowledge graph relation prediction, but the native translation model has insufficient ability in dynamic representation, attribute differentiation and text feature fusion, so it is difficult to be directly applied to the task of knowledge graph relation prediction of scientific articles.[Method/process] In this paper, an improved translation model, CoTransH,is proposed to predict the semantic relation of knowledge graph of scientific articles.Data preparation layer: firstly, the annotated corpus for relation prediction is automatically constructed by integrating the techniques of sentence recognition, entity extraction, semantic similarity measurement, etc.Then the dynamic vector is generated by integrating text features and external prior knowledge to enhance the learning ability of the model for semantic representation under dynamic scenes.The model structure layer: firstly, the hyperplane mechanism is introduced to solve the many-to-many relation prediction.Secondly, the nonlinear convolutional layer is added to distinguish the head-to-tail entity attributes.Thirdly, the score function is improved to improve the attention of the relationship.Finally, the negative example generation strategy is improved according to the corpus characteristics to improve the prediction accuracy of the model to the relationship.[Result/conclusion] The CoTransH model is used to construct an Artificial Intelligence(AI) field knowledge graph with the "problem" phrase and "method" phrase contained in the scientific article abstract in the AI field as the nodes, and the "adoption" and "solution" relation as the edges.
出处
《情报理论与实践》
CSSCI
北大核心
2021年第11期187-196,共10页
Information Studies:Theory & Application
基金
国家自然科学基金青年科学基金项目“中文网络文本的地理实体语义关系标注与评价”(项目编号:41801320)
资源与环境信息系统国家重点实验室开放基金的研究成果。
关键词
科技文献知识图谱
知识图谱补全
关系预测
翻译模型
knowledge graph of scientific articles
knowledge graph completion
relation prediction
translation models