期刊文献+

基于编码器-解码器架构的藏医药文本实体关系联合抽取

Joint Entity and Relation Extraction for Tibetan Medicine Texts Based on Encoder-Decoder Architectures
下载PDF
导出
摘要 在藏医药领域,准确提取医学文本中的医学实体及其关系并结构化为三元组,对于构建藏医药知识图谱具有重要意义。然而,现有方法主要依赖通用预训练模型处理藏医药文本,这些模型未能充分覆盖藏医药领域的专业术语,且在泛化性和鲁棒性方面存在不足。为此,文章提出了一种新型模型,该模型基于编码器-解码器架构,并融合了指针机制。在编码阶段,BERT和GloVe被用于生成丰富的嵌入表示,这些表示经过融合,增强了模型对医学领域文本的理解力;在解码阶段,通过将Transformer解码器和指针机制结合,模型直接生成与实体和关系相关的结构化信息。此外,文章通过引入“相似跨度”的概念和相应的惩罚性训练策略,进一步增强了模型识别实体的能力。通过在CMeIE-V2和藏医药数据集TibetanAI_TMDisRE_v1.0上进行广泛实验,并与基线模型进行对比,验证了文章模型的卓越性能和鲁棒性。 In the study field of Tibetan medicine,it is essential to accurately extract the medical entities and their relationships in medicine texts and structure them into triples,which is crucial for constructing knowledge graphs.However,the existing methods,which mainly rely on general pre-trained models to process Tibetan med-icine texts,often overlook the specialized terminology,leading to limitations in generalization and robustness.This paper propose a model based on the encoder-decoder architecture,enhanced with a pointer mechanism,to overcome these shortcomings.In the encoding phase,the model utilizes BERT and GloVe to generate rich embed-dings,significantly improving the understanding of medical terms.In the decoding phase,a Transformer decoder is combined with a pointer mechanism to produce structured entity and relationship information directly.The training process incorporates the concept of similar spans to refine the model's entity recognition capabilities.Ex-periments on the CMeIE-V2 and TibetanAI_TMDisRE_v1.0 datasets show that this model outperforms advanced baselines in performance and robustness.
作者 高兴 拥措 GAO Xing;Yongcuo(School of Information Science and Technology,Tibet University,Lhasa 850000,China;Key Laboratory of Tibetan Information Technology and Artificial Intelligence,Lhasa 850000,China;Engineering Research Center for Tibetan Language Information Technology under the Ministry of Education,Lhasa 850000,China)
出处 《高原科学研究》 CSCD 2024年第4期115-128,共14页 Plateau Science Research
基金 科技创新2030——“新一代人工智能”重大项目(2022ZD0116100) 西藏自治区科技厅项目(XZ202401JD0010)。
关键词 编码器-解码器架构 指针机制 藏医药文本 实体关系联合抽取 encoder-decoder architecture pointer mechanism Tibetan medicine texts joint entity and relation extraction

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部