摘要
针对传统领域知识实体抽取算法主要依赖专家的专业知识,需要的标注工作量较大,本文提出了基于远程监督的实体抽取算法并应用于粮油存储领域。算法在PU学习的框架下,通过判定和分类2个阶段抽取实体,利用双向长短期记忆网络进行二分类实体判别。再通过全连接网络实体类型判别,构建了一个粮油领域知识图谱。研究表明:本算法可以应用于粮油存储领域的知识图谱构建,适用于训练实体样本较少的实体抽取任务,能够缩小使用双向长短期记忆网络算法进行实体抽取任务所需的语料规模,并在使用更小语料规模的情况下达到与经典双向长短期记忆网络算法相当的实体抽取效果。
The traditional domain knowledge entity extraction algorithm mainly depends on the professional knowledge of experts,which requires a large amount of annotation workload and is difficult to apply in new fields.To solve this problem,this paper proposes an entity extraction algorithm based on remote supervision and applies it to the field of grain and oil storage.Under the framework of positive unlabeled learning,the algorithm performs entity extraction through two stages of entity determination and entity classification.First,a bidirectional Long Short-Term Memory neural network(BiLSTM)was used for two-class entity identification.Second,the fully connected network was used for entity type identification.Finally,the algorithm was used to extract entities to construct a knowledge graph in the field of grain and oil storage,which verified the feasibility of the algorithm.This algorithm is suitable for entity extraction tasks with few training entity samples and reduces the corpus size required for the BiLSTM-based algorithm entity extraction.Moreover,it achieves comparable results to those of the classical BiLSTM-based algorithm.
作者
葛亮
张艺璇
李伟平
GE Liang;ZHANG Yixuan;LI Weiping(School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China;School of Software and Microelectronics, Peking University, Beijing 100871, China)
出处
《哈尔滨工程大学学报》
EI
CAS
CSCD
北大核心
2022年第4期564-571,共8页
Journal of Harbin Engineering University
基金
国家重点研发计划(2020YFC0833301).
关键词
领域知识图谱
本体设计
实体抽取
远程监督
深度学习
PU学习
双向长短期记忆网络
知识图谱构建
domain-specific knowledge graph
ontology design
entity extraction
remote supervision
deep learning
positive unlabeled learning
a bidirectional long short-term memory neural network
knowledge graph building