摘要
目的:基于深度学习方法,对《脉经》中的术语命名实体识别进行研究。方法:针对中医典籍《脉经》涵盖了大量专业术语、知识体系复杂且分词困难等问题,采用迁移学习与BERT相结合的方法,对《脉经》数据集进行预处理,并与BERT-CRF、BiLSTM-CRF、BERT-BiLSTM-CRF模型进行对比。结果:本实验构建的BERT-BiLSTM-CRF-部首特征模型命名实体识别的F1值为84.77%,相较于BERT-CRF、BiLSTM-CRF、BERT-BiLSTM-CRF模型,该模型在词向量的构建过程中,充分考虑了中医领域的专业性和特殊性,不仅针对上下文语境进行了学习,还针对实体词的部首特征进行了学习,效果最优。结论:利用BERT-BiLSTM-CRF-部首特征模型能够有效实现中医古籍术语命名实体类别识别,有效提高了中医古籍的实体识别准确率,为后续知识图谱构建奠定技术基础,亦为临床诊断提供高质量数据支持。
Objective:Based on deep learning method,this paper studies named entity recognition of terms in an ancient traditional Chinese medicine(TCM)book Pulse Classic.Methods:The book covers a large number of professional terms,the knowledge system is complicated,and the classification of words is difficult.Therefore,we used the combination of transfer learning and BERT to preprocess the Pulse Classic data set,and compared it with Bert-CRF,BiLSTM-CRF and Bert-BilstM-CRF models.Results:The F1 value of named entity recognition of Bert-Bilstm-CRF-radical feature model constructed in this experiment was 84.77%.Compared with BERT-CRF,BiLSTM-CRF and BERT-BiLSTM-CRF models,this model fully considered the professionalism and particularity of the field of Chinese medicine during the construction of word vectors,and learns not only the context,but also the radical features of entity words,with the optimal effect.Conclusion:The Bert-BilstM-CRF-radically feature model can effectively realize the named entity category recognition of terms of TCM ancient books,effectively improve the entity recognition accuracy of Chinese ancient books,lay a technical foundation for the subsequent knowledge map construction,and provide high-quality data support for clinical diagnosis.
作者
宋熹玥
冯鑫雅
胡为
刘伟
SONG Xiyue;FENG Xinya;HU Wei;LIU Wei(Hunan University of Chinese Medicine,Changsha 410208,China)
出处
《中医药信息》
2024年第7期1-6,共6页
Information on Traditional Chinese Medicine
基金
湖南省自然科学基金项目(2022JJ30438)
长沙市自然科学基金项目(kq2202260)
湖南省中医药科研课题项目(B2023039)。