摘要
电子病历中蕴含着丰富的医学信息,这些医学信息对疾病的诊疗具有十分重要的意义.利用命名实体识别技术对电子病历进行信息抽取已成为研究的热点之一,为了更加高效准确的抽取中文电子病历中的实体,提出了BERT-BiLSTM-CRF命名实体识别模型.模型在传统BiLSTM-CRF模型基础上,融合了BERT字嵌入模型,更好的结合文章上下文,充分考虑了一词多义等问题.实验结果证明,该模型在中文电子病历命名实体识别任务中取得了良好的效果,较现有命名实体识别方法,从准确率、召回率、F1值3方面都有着明显的提升.电子病历命名实体识别任务准确度的提高,对进一步构建医学知识图谱、医学知识库等任务有着重大帮助.
The electronic medical records contain a wealth of medical information,which is of great significance for diseases diagnosis and treatment.Using named entity recognition technology to extract information from electronic medial record has become one of the research hotpots.In order to extract entities in Chinese electronic medical records more efficiently and accurately,BERT-BiLSTM-CRF model was proposed which incorporates BERT word embedding with traditional BiLSTM-CRF model to better combine the context of the article and fully consider the word polysemy.The experimental results showed that compared with the existing named entity recognition method,this model achieved better results on accuracy,recall rate and F1.The improvement in the accuracy of electronic medical record naming entity recognition task is of great help to the further construction of medical knowledge graphs,and medical knowledge bases and so on.
作者
李灵芳
杨佳琦
李宝山
杜永兴
胡伟健
LI Lingfang;YANG Jiaqi;LI Baoshan;DU Yongxing;HU Weijian(Information Engineering School,Inner Mongolia University of Science and Technology,Baotou 014010,China)
出处
《内蒙古科技大学学报》
CAS
2020年第1期71-77,共7页
Journal of Inner Mongolia University of Science and Technology
基金
国家自然科学基金资助项目(61661044,61961033)
内蒙古自治区高等学校青年科技英才计划(NJYT-19-A15)
优秀青年科学基金项目(2017YQL10)
内蒙古自治区自然科学基金资助项目(2019MS06021).
关键词
中文命名实体识别
BERT模型
中文电子病历
预训练语言模型
Chinese named entity recognition
BERT Model
Chinese lectronic medical record
pre-trained language model