摘要
为避免实体与关系独立抽取产生的误差累计及实体重叠问题,提出一种基于BERT和非自回归的联合抽取模型来进行医疗知识抽取。首先,通过BERT预训练语言模型进行句子编码;然后,采用非自回归(NAR,Non-autoregressive)的方法实现并行解码,抽取关系类型,并根据头尾实体的位置索引抽取实体,得到医疗实体的关系三元组;最后,将抽取出的实体和关系导入Neo4j图数据库中实现知识可视化。通过对电子病历中的数据进行人工标注得到数据集,实验结果表明,基于BERT和非自回归联合学习模型的F1值为0.92,precision值为0.93,recall值为0.92,与现有模型相比3项评价指标均有提升,表明本文方法能够有效抽取电子病历中的医疗知识。
In order to avoid the problems of error accumulation and entity overlap caused by the pipeline entity relation extraction model,a joint extraction model based on BERT and Non-autoregressive is established for medical knowledge extraction. Firstly,with the help of the BERT pre-trained language model,the sentence code is obtained. Secondly,the Non-autoregressive method is proposed to achieve parallel decoding,extract the relationship type,extract entities according to the index of the subject and object entities,and obtain the medical triplet. Finally,we import the extracted triples into the Neo4j graph database and realize knowledge visualization. The dataset is derived from manual labeling of data in electronic medical records. The experimental results show that the F1 value,precision and recall based on BERT and non-autoregressive joint learning model are 0.92,0.93and 0.92,respectively. Compared with the existing model,the three evaluation indicators have been improved,indicating that the proposed method can effectively extract medical knowledge from electronic medical records.
作者
于清
马志龙
徐春
YU Qing;MA Zhi-long;XU Chun(School of Information Management,Xinjiang University of Finance and Economics,Urumqi 830012,China)
出处
《计算机与现代化》
2023年第1期120-126,共7页
Computer and Modernization
基金
新疆维吾尔自治区自然科学基金资助项目(2019D01A23)
新疆维吾尔自治区高校科研计划项目(XJEDU2021Y038)。
关键词
联合学习
非自回归
BERT
实体重叠
电子病历
joint learning
non-autoregressive
BERT
entity overlap
electronic medical record