摘要
针对实体和关系抽取过程中存在的一词多义、实体嵌套、三元组重叠的问题,本文提出了1种融合RoBERTa-WWM和全局指针网络的联合抽取模型RBGPL。该模型引入RoBERTa-WWM预训练模型,利用上下文的语境信息融合克服了不同语境下一词多义问题;采用全局指针网络Global pointer标注方式解决了实体嵌套问题;通过全局指针联合解码模型将三重抽取转变为五重提取,解决了三元组重叠问题。在自建农业病害数据集上,模型RBGPL的精确率、召回率、F1值达到76.23%,91.18%,83.04%,与其他联合抽取模型相对比F1值均取最优,有效地克服了一词多义问题和三元组重叠问题。此外,在病原(Pathogeny)和作物名称(Crop)2种易嵌套实体的F1值上提升了3%和18%,实体嵌套得到了显著缓解。本文方法提高了中文农业病害领域实体关系抽取性能,可为农业病害领域知识图谱的构建提供技术支持。
Aiming at the problems of polysemy,entity nesting,and triple overlap existing in the process of entity and relation extraction,this paper proposesd a joint extraction model RBGPL that integrates RoBERTa-WWM and Global Pointer network.Firstly,the RoBERTA WWM pre-training model is introduced to overcome the problem of polysemy in different contexts by using context information fusion.Secondly,the global pointer network Global Pointer annotation method was used to solve the problem of entity nesting.Finally,the triple extraction is transformed into the quintuple extraction through the global pointer joint decoding model,which solves the problem of triple overlap.When ran on the self built agricultural disease data set,the accuracy,recall and F1 values of the model RBGPL reached 76.23%,91.18%and 83.04%,which were the best compared with other joint extraction models,and effectively overcame the problem of polysemy and triple overlap.In addition,F1 values of pathogen and crop easily nested entities increased by 3%and 18%,and entity nesting was significantly alleviated.This method improved the performance of Chinese agricultural disease domain entity relationship extraction,and can provide technical support for the construction of agricultural disease domain knowledge map.
作者
王彤
张立杰
王铭
吴华瑞
朱华吉
杨英茹
王春山
WANG Tong;ZHANG Lijie;WANG Ming;WU Huarui;ZHU Huaji;YANG Yingru;WANG Chunshan(College of Information Science and Technology,Hebei Agricultural University,Baoding 071001,China;College of Mechanical and Electrical Engineering,Hebei Agricultural University,Baoding 071001,China;National Engineering Research Center for Information Technology in Agriculture,Beijing 100097,China;Shijiazhuang Academy of Agriculture and Forestry Sciences,Shijiazhuang 050041,China;Hebei Education Examinations Authority,Shijiazhuang 050091,China)
出处
《河北农业大学学报》
CAS
CSCD
北大核心
2024年第3期113-120,129,共9页
Journal of Hebei Agricultural University
基金
河北省自然基金项目(F2022204004)
国家大宗蔬菜产业技术体系项目(CARS-23-D07)
国家重点研发计划项目(2020YFD1100204).