摘要
命名实体分类和识别是自然语言处理中的关键任务,其识别效果将会影响许多下游任务的性能。文章基于现有知识图谱,提出图情领域九大类实体,构建适用于图情领域实体识别的LISERNIE+BiGRU+CRF模型。其中,LISERNIE模型的训练以ERNIE为基础,增加了注入图情领域知识的预训练阶段训练。通过开展广泛的实验,发现LISERNIE+BiGRU+CRF模型能有效识别出命名实体,且在小规模标注数据集上具有明显的性能优势;当应用到后续的开放域关系抽取实验时,其准确率远高于CORE系统,可为进一步构建知识图谱、问答系统、机器阅读等提供模型和数据支撑。
Named Entity Recognition(NER),as a key task in Natural Language Processing(NLP),will affect the performance of many downstream tasks.Based on the existing knowledge graphs,this paper proposes 9 major categories of named entities in the field of Library Information Science(LIS),and creates a LISERNIEE+BiGRU+CRF model applicable to the identification of LIS entities.The results show that the LISERNIE model,which is injected with LIS knowledge in the pre-training phase,can effectively identify these named entities,and has obvious performance advantages on small-scale annotated dataset.In the subsequent open domain relationship extraction experiment,the LISERNIE model performs much better than the CORE system.Therefore,the LISERNIE model could provide data support for the improvement of Knowledge Graph,question answering system and machine reading.
作者
王娟
王志红
曹树金
WANG Juan;WANG Zhihong;CAO Shujin
出处
《图书馆论坛》
CSSCI
北大核心
2023年第7期15-25,共11页
Library Tribune
基金
国家社会科学基金一般项目“基于深度学习的学科领域网络学术情报发现研究”(项目编号:18BTQ065)
中国博士后面上资助项目“复杂任务下基于用户相关性反馈的会话搜索优化研究”(项目编号:2021M691823)
广州市科技计划项目“基于深度学习的软件漏洞挖掘方法研究”(项目编号:202201010100)研究成果。
关键词
命名实体识别
知识图谱
预训练语言模型
领域知识
Named Entities Recognition(NER)
knowledge graph
pre-trained language model
domain knowledge