摘要
常规领域文本识别相对容易,而专业术语存在大量嵌套命名实体,识别难度大,是构建航空航天领域知识图谱的核心挑战之一。现有的命名实体识别技术多采用双向长短记忆网络加条件随机场(BiLSTM-CRF)识别实体,很难区分导弹领域术语的嵌套、交叉等复杂关系。为解决这一难题,在对领域文本进行嵌套实体标注的基础上,提出一种融合语言学特征、基于机器阅读理解的嵌套命名实体识别方法,引入先验知识、改变解码方式,以问答形式进行多任务预测。实验表明:所提方法能有效提高导弹领域文本嵌套实体识别的准确率和召回率,其综合指标F1值相较于基于BiLSTM-CRF的嵌套命名实体识别方法提高了13.89%。
Compared with the text recognition in conventional fields,it is difficult to recognize the large number of nested named entities in professional terms.This is also one of the care challenges in building the knowledge graph in aerospace field.For the named entity recognition technologies,bidirectional long short-term memory network plus conditional random field(BiLSTM-CRF) is often used to identify entities,which is difficult to distinguish the complex relationships such as nesting and intersection of terms in missile field.In order to solve the problem,based on the nested entity labeling of domain text,a nested named entity recognition method based on linguistic features and machine reading comprehension is proposed,in which prior knowledge is introduced,decoding method is changed,and multi-task predictions are carried out in the form of question and answer.Experiments show that the proposed method can greatly improve the accuracy and recall rate of text nested entity recognition in missile field,in which the comprehensive index F1 value is improved by 13.89% compared with the nested named entity recognition method based on BiLSTM-CRF.
作者
关景文
宋晓
李晓庆
杨彤
周军华
Guan Jingwen;Song Xiao;Li Xiaoqing;Yang Tong;Zhou Junhua(School of Automation Science and Electrical Engineering,Beihang University,Beijing 100191,China;School of Cyber Science and Technology,Beihang University,Beijing 100191,China;Beijing Simulation Center,Beijing 100854,China)
出处
《系统仿真学报》
CAS
CSCD
北大核心
2023年第8期1757-1767,共11页
Journal of System Simulation
基金
国家重点研发计划(2020YFB1712203)。
关键词
导弹
嵌套命名实体识别
知识抽取
机器阅读理解
语言学特征
missile
nested named entity recognition
knowledge extraction
machine reading comprehension
linguistic features