期刊文献+

基于RoBERTa和T5的两阶段医学术语标准化

Two-stage Medical Terminology Standardization Based on RoBERTa and T5
下载PDF
导出
摘要 医学术语标准化作为消除实体歧义性的重要手段,被广泛应用于知识图谱的构建过程之中.针对医学领域涉及大量的专业术语和复杂的表述方式,传统匹配模型往往难以达到较高的准确率的问题,提出语义召回加精准排序的两阶段模型来提升医学术语标准化效果.首先在语义召回阶段基于改进的有监督对比学习和RoBERTa-wwm提出语义表征模型CL-BERT,通过CL-BERT生成实体的语义表征向量,根据向量之间的余弦相似度进行召回并得到标准词候选集,其次在精准排序阶段使用T5结合prompt tuning构建语义精准匹配模型,并将FGM对抗训练应用到模型训练中,然后使用精准匹配模型对原词和标准词候选集分别进行精准排序得到最终标准词.采用ccks2019公开数据集进行实验,F1值达到了0.9206,实验结果表明所提出的两阶段模型具有较高的性能,为实现医学术语标准化提供了新思路. Medical terminology standardization,as an important means to eliminate entity ambiguity,is widely used in the process of building knowledge graphs.Aiming at the problem that the medical field involves a large number of professional terminology and complex expressions,and the traditional matching models are often difficult to achieve a high accuracy rate,a two-stage model of semantic recall and precise sorting is proposed to improve the standardization effect of medical terminology.First,in the semantic recall stage,a semantic representation model CL-BERT is proposed based on the improved supervised contrastive learning and RoBERTa-wwm.The semantic representation vector of an entity is generated through CL-BERT,and recall is carried out according to the cosine similarity between the vectors,so as to obtain the standard word candidate set.Secondly,in the precise sorting stage,T5,combined with prompt tuning,is used to build a precise semantic matching model,and FGM confrontation training is applied to the model training;next,the precise matching model is used to precisely sort the original word and standard word candidate sets,so as to obtain the final standard words.The ccks2019 public data set is used for experiments,achieving an F1 value of 0.9206.The experimental results show that the proposed two-stage model showcases high performance,and provides a new idea for medical terminology standardization.
作者 周景 崔灿灿 王梦迪 王泽敏 ZHOU Jing;CUI Can-Can;WANG Meng-Di;WANG Ze-Min(School of Control and Computer Engineering,North China Electric Power University,Beijing 102206,China;Beijing Smart Insight Technology Co.Ltd.,Beijing 100080,China)
出处 《计算机系统应用》 2024年第1期280-288,共9页 Computer Systems & Applications
关键词 医学术语标准化 RoBERTa-wwm 对比学习 T5 prompt tuning 知识图谱 medical terminology standardization RoBERTa-wwm contrastive learning T5 prompt tuning knowledge graph
  • 相关文献

参考文献5

二级参考文献9

共引文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部