期刊文献+

基于多特征嵌入的中文医学命名实体识别

Chinese Medical Named Entity Recognition Based on Multi-feature Embedding
下载PDF
导出
摘要 针对基于字符表示的中文医学命名实体识别模型嵌入信息单一、缺失词边界和结构信息的问题,文中提出了一种融合多特征嵌入的医学命名实体识别模型。首先,将字符映射为固定长度的嵌入表示;其次,引入外部资源构建词汇特征,该特征能够补充字符的潜在词组信息;然后,根据中文的象形文字特点和文本序列特点,分别引入字符结构特征和序列结构特征,使用卷积神经网络对两种结构特征进行编码,得到radical-level词嵌入和sentence-level词嵌入;最后,将得到的多种特征嵌入进行拼接,输入长短期记忆网络编码,并使用条件随机场输出实体预测结果。将自建中文医疗数据和CHIP_2020任务提供的医疗数据作为数据集进行实验,实验结果表明,与基准模型相比,所提模型同时融合了词汇特征和文本结构特征,能够有效识别医学命名实体。 Aiming at the problems of single embedding information,lacking of word boundary and text structure information in Chinese medical named entity recognition(NER)model based on character representation,this paper presents a medical named entity recognition model integrating multi-feature embedding.Firstly,the characters are mapped to a fixed-length embedding representation.Secondly,external resources are introduced to construct lexical feature,which can supplement the potential phrase information of characters.Thirdly,according to the characteristics of Chinese pictographs and text sequences,character structure feature and sequence structure feature are introduced,respectively.The convolutional neural networks are used to encode the two structural features to obtain radial-level word embedding and sentence-level word embedding.Finally,the obtained multiple feature embeddings are concatenated and input into the long short-term memory network encoding,and the entity result is output by the CRF layer.Taking the self-built Chinese medical data and the CHIP_2020 data as the datasets,experimental results show that compared with the benchmark models,the proposed model integrating both lexical feature and text structure feature can effectivelyidentify named entities in the medical field.
作者 黄健格 贾真 张凡 李天瑞 HUANG Jiange;JIA Zhen;ZHANG Fan;LI Tianrui(School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu 611756,China;Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province,Chengdu 611756,China;National Engineering Laboratory of Integrated Transportation Big Data Application Technology,Chengdu 611756,China)
出处 《计算机科学》 CSCD 北大核心 2023年第6期243-250,共8页 Computer Science
基金 国家自然科学基金(62176221)。
关键词 命名实体识别 中文医学文本 词汇信息 文本结构特征 深度学习 Named entity recognition Chinese medical text Lexical information Text structure features Deep learning
  • 相关文献

参考文献5

二级参考文献18

共引文献48

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部