摘要
电子病历实体识别是智慧医疗服务中一项重要的基础任务,当前医院诊疗过程中采用人工分析病历文本的方法,容易产生关键信息遗漏且效率低下。为此,提出一种结合BERT与条件随机场的实体识别模型,使用基于双向训练Transformer的BERT中文预训练模型,在手工标注的符合BIOES标准的语料库上微调模型参数,通过BERT模型学习字符序列的状态特征,并将得到的序列状态分数输入到条件随机场层,条件随机场层对序列状态转移做出约束优化。BERT模型具有巨大的参数量、强大的特征提取能力和实体的多维语义表征等优势,可有效提升实体抽取的效果。实验结果表明,论文提出的模型能实现88%以上的实体识别F1分数,显著优于传统的循环神经网络和卷积神经网络模型。
Electronic medical record entity recognition is an important basic task in intelligent medical services.At present,the method of manual analysis of medical record text is used in the process of diagnosis and treatment in hospitals,which is easy to produce key information omission and inefficient.Therefore,a kind of entity recognition model combining BERT and conditional random field is proposed.Using the BERT Chinese pre-training model based on bi-directional training transformers,the parameters of the model are fine-tuned on the manually marked corpus which conforms to the BIOES standard.Through the BERT model,the state characteristics of character sequences are learned,and the obtained sequence state scores are input into conditional random field layer,which makes a reduction to the sequence state transition bundle.BERT model has many advantages,such as huge parameters,powerful feature extraction ability and multi-dimensional semantic representation of entities,which can effectively improve the effect of entity extraction.The experimental results show that the BERT-CRF model obtained more than 88% of the entity recognition F1 score,which is significantly better than the traditional recurrent neural network and convolutional neural network model.
作者
何涛
陈剑
闻英友
HE Tao;CHEN Jian;WEN Yingyou(Neusoft Reserch,Northeastern University,Shenyang 110169;Research Center of Safety Engineering Technology in Industrial Control of Liaoning Province,Shenyang 110169)
出处
《计算机与数字工程》
2022年第3期639-643,共5页
Computer & Digital Engineering
基金
国家重点研发计划(编号:2018YFC0830601)
辽宁省重点研发计划(编号:2019JH2/10100027)
教育部基本科研业务费项目(编号:N171802001)
辽宁省“兴辽英才计划”项目(编号:XLYC1802100)资助。
关键词
深度学习
BERT
条件随机场
命名实体识别
电子病历
deep learning
BERT
conditional random field
named entity recognition
electronic medical records