摘要
[目的/意义]医学实体抽取是医疗健康领域信息组织和知识挖掘的关键环节。针对中文医学实体专业性强、命名规则复杂和抽取难度大的现状,探究如何利用多种深度学习方法混合协作以提升中文医学实体抽取的准确性。[方法/过程]首先在深度学习模型BiLSTM-CRF基础上,引入语言模型BERT和迭代膨胀卷积神经网络IDCNN,增强文本语义表征能力和局部特征捕获能力;接着利用BERT预训练进行外部医学语料资源的知识迁移,实现多语义特征融合;然后引入自注意力机制捕获全局上下文重要信息,并加入Highway优化深层网络训练,解决网络加深导致的精度下降问题,最终提出MF-HDL(Multi Feature-Hybrid Deep Learning)模型。[结果/结论]MF-HDL模型在中文糖尿病数据集上效果显著,其F1值较基准模型IDCNN-CRF和BiLSTM-CRF分别提升18.42%和17.18%,此方法在中文医学实体抽取任务上表现优异。
[Purpose/significance]Medical entity extraction is a key link in information organization and knowledge mining in the medical and health field.Aiming at the current situation of strong professionalism of Chinese medical entities,complex naming rules and difficulty in extraction,this paper explores how to use a variety of deep learning methods to mix and cooperate to enhance the accuracy of Chinese medical entity extraction.[Method/process]Firstly,on the basis of the deep learning model BiLSTM-CRF,this study introduced the language model BERT and iterative expanded convolutional neural network IDCNN to enhance the text semantic representation ability and local feature capture ability.Secondly,it utilized the BERT pre-training to transfer the knowledge of external medical corpus resources and realize the fusion of multiple semantic features.In addition,the self-attention mechanism was introduced to capture important global contextual information,and Highway was added to optimize deep network training to solve the problem of reduced accuracy caused by network deepening.Finally,MF-HDL model(Multi Feature-Hybrid Deep Learning)was proposed.[Result/conclusion]The MF-HDL model has a significant performance on the Chinese diabetes dataset.Compared with the benchmark models IDCNN-CRF and BiLSTM-CRF,the F1 value of MF-HDL has increased by 18.42%and 17.18%,respectively,which verifies the excellent performance of the method in the Chinese medical entity extraction task.
作者
韩普
顾亮
Han Pu;Gu Liang(School of Management,Nanjing University of Posts&Telecommunications,Nanjing 210003;Jiangsu Provincial Key Laboratory of Data Engineering and Knowledge Service,Nanjing 210023)
出处
《图书情报工作》
CSSCI
北大核心
2022年第14期119-127,共9页
Library and Information Service
基金
国家社会科学基金项目"大数据环境下健康领域实体语义挖掘研究"(项目编号: 17CTQ022)
江苏省研究生科研创新计划基金项目"基于深度学习的医学文献实体识别研究"(项目编号:KYCX21_0844)研究成果之一。