摘要
如今随着互联网的发展,数据呈现的方式大不相同,然而知识图谱的出现,给人们提供了一种更好地组织、管理和理解海量信息的能力.知识图谱质量的高低与实体以及实体之间的关系存在密不可分的关系,从实体角度出发,研究实体识别方法.如今大多数深度学习模型对实体识别效果不错,但在语义信息方面没有考虑上下文信息,并且模型体积庞大,参数数量多,导致模型预测结果与真实结果误差大,能耗高.提出了一种ELECTRA模型与神经网络模型结合来进行命名体识别的方法,该方法降低能耗以及提升训练速度,同时又提高了实体识别的准确率等.该组合模型分为三块:首先对ELECTRA模型进行改进,输入文本进行[cls]以及[seq]处理,避免实体边界模糊问题.然后进行随机15%的Mask机制,经生成器预测,再经判别器判别,形成字向量.其次将字向量引入双向长短期记忆网络BiLSTM中,进行上下文语义增强后将句子序列打分.最后通过条件随机场CRF层找到最优的序列标签.实验结果表明,该方法在医疗语料库进行实体识别时,准确率为97.94%、召回率为95.41%、F1值为95.44%、精确率为95.46%,与已有的方法相比,提出的方法效果提升明显.
Nowadays,with the development of the Internet,data is presented in a very different way.However,the emergence of knowledge graphs has provided an ability to better organize,manage and understand massive amounts of information.The quality of knowledge graphs is inextricably linked to entities and the relationships between entities,so this paper investigates entity i⁃dentification methods from the entity perspective.At present,most deep learning models are effective,but contextual information is not considered in terms of semantic information,and the model is huge and the number of parameters is large,resulting in large errors between model prediction results and real results,and high energy consumption.Therefore,this paper proposed a method of combining ELECTRA model and neural network model for name recognition,aiming to improve the accuracy of entity recognition.The combined model was divided into three parts.First,the ELECTRA model was improved,and the input text was processed with[cls]and[seq]to avoid the problem of fuzzy entity boundaries;Then a random 15%Mask mechanism was per⁃formed,which was predicted by the generator,and then discriminated by the discriminator to form a word vector.Second,the word vector was introduced into BiLSTM,and the sentence sequence was scored after the contextual semantic enhancement.Third,the optimal sequence label was found through the CRF layer.The experimental results showed that the ac⁃curacy rate of this method was 0.994,the recall rate was 0.99,the F1 value was 0.986,and the accuracy rate was 0.983 when performing entity recognition in the medical corpus.Compared with the existing methods,the method proposed in this paper is effective.
作者
佘文浩
李卫榜
杨茂
崔梦天
SHE Wen-hao;LI Wei-bang;YANG Mao;CUI Meng-tian(The Key Laboratory for Computer Systems of State Ethnic Affairs Commission,Southwest Minzu University,Chengdu 610041,China)
出处
《西南民族大学学报(自然科学版)》
CAS
2023年第2期197-205,共9页
Journal of Southwest Minzu University(Natural Science Edition)
基金
西南民族大学2021年研究生“创新型科研项目”硕士一般项目(CX2021SP121)
四川省社会科学研究规划项目(SC20B127)。