摘要
电子病历是保存、管理、传输病人医疗记录的重要资源,是医生诊治疾病的重要文本记录。通过电子病历命名实体识别(NER)技术能够高效、智能地从电子病历中抽取症状、疾病、药名等诊疗信息,有利于结构化电子病历,使之能够使用机器学习等技术进行诊疗规律挖掘。为了高效识别电子病历中的命名实体,提出一种融合对抗训练(FGM)的基于BERT与双向长短期记忆网络(BILSTM)的命名实体识别方法(BERT-BILSTM-CRF-FGM,BBCF),对2017全国知识图谱与语义计算大会(CCKS2017)提供的中文电子病历语料做修正等预处理后,采用BBCF模型识别该语料中5种实体的平均F1值为92.84%,比基于膨胀卷积网络的BERT模型(BERT-IDCNN-CRF)和基于BILSTM的条件随机场模型(BILSTM-CRF)有更高的F1值和更快的收敛速度,能够更加高效地结构化电子病历文本。
Electronic medical record is an important resource for the preservation,management and transmission of patients’medical records.It is also an important text record for doctors’diagnosis and treatment of diseases.Through the electronic medi cal record named entity recognition(NER)technology,diagnosis and treatment information such as symptoms,diseases and drug names can be extracted from the electronic medical record efficiently and intelligently.It is helpful for structured electronic medical records to use machine learning and other technologies for diagnosis and treatment regularity mining.In order to effi ciently identify named entities in electronic medical records,a named entity recognition method based on BERT and bidirec tional long short-term memory network(BILSTM)with fusion adversarial training(FGM)is proposed,referred to as BERT-BILSTM-CRF-FGM(BBCF).After preprocessing by correcting the Chinese electronic medical record corpus provided by the 2017 National Knowledge Graph and Semantic Computing Conference(CCKS2017),the BERT-BILSTM-CRF-FGM model is used to recognize five types of entities in the corpus,with an average F1 score of 92.84%.Compared to the BERT model based on the inflated convolutional neural network(BERT-IDCNN-CRF)and the conditional random field model based on BILSTM(BILSTM-CRF),the proposed method has higher F1 score and faster convergence speed,which can more efficiently structure electronic medical record text.
作者
郑立瑞
肖晓霞
邹北骥
刘彬
周展
ZHENG Li-rui;XIAO Xiao-xia;ZOU Bei-ji;LIU Bin;ZHOU Zhan(School of Information Science and Engineering,Hunan University of Chinese Medicine,Changsha 410208,China;School of Computer Science and Engineering,Central South University,Changsha 410083,China)
出处
《计算机与现代化》
2024年第1期87-91,共5页
Computer and Modernization
基金
2017年科技部十三五重点研发计划(2017YFC1703300)
科技创新2030-“新一代人工智能”重大项目课题(2018AAA0102102)。
关键词
电子病历
命名实体识别
BERT
FGM
双向长短期记忆网络
条件随机场
electronic medical record
named entity recognition
BERT
FGM
BILSTM(Bidirectional Long Short-Term Memory Network)
conditional random field