摘要
命名实体识别(named entity recognition,NER)可整合复合材料检测领域相关数据精准提取关键实体信息,促进产业信息化,为行业发展提供技术支撑。针对复合材料检测领域专业名词过多及边界混淆等问题,提出了一种基于对抗训练(adversarial training)和BERT(bidirectional encoder representations from transformers)嵌入相结合的领域命名实体识别模型(BERT-AdBC)。首先,复合材料检测领域数据规模较小,BERT嵌入增强了领域迁移能力,通过融合字向量获取充分的语义表示;其次,领域语句繁杂冗长,引入自注意力机制结合双向长短期记忆网络(Bi-LSTM)模型增强了上下文之间语义关系的获取;最后,对抗训练利用分词任务与实体识别任务的共享信息解决了边界混淆问题。实验结果表明,本文所提出的BERT-AdBC模型对复合材料检测领域实体识别的效果要优于传统模型,综合评价指标F最高提升6.48%。
Named entity recognition(NER)could integrate the source data and extract the crucial entities in the field of composite materials testing field,and provide technical support for the development of the area.In order to solve the problem of excessive terminology and boundary confusion in the field of composite materials testing,a NER method based on Adversarial training and BERT(bidirectional encoder representations from transformers)was proposed,and then a BERT-AdBC model was designed.Firstly,the data set in the field of inspection testing is pretty small,so the representation of semantic relations was enhanced by the BERT embedding.And the transferability of model parameters was improved effectively by the transformer structure.Then the retrieval of semantic relationships in contexts was strengthened by the Bi-LSTM-CRF model combined with self-attention.Finally,the influence of boundary confusion on the NER task and CWS(Chinese words segmentation)task was reduced by adversarial training.The experimental results show that the performance of the proposed model is better than traditional models,and the maximum increase of Fvalue is 6.48%.
作者
李洋
蔡红珍
邢林林
苏展鹏
LI Yang;CAI Hong-zhen;XING Lin-lin;SU Zhan-peng(School of Agricultural Engineering and Food Science,Shandong University of Technology,Zibo 255000,China;Shandong Research Center of Engineering and Technology for Clean Energy,Zibo 255000,China;School of Computer Science and Technology,Shandong University of Technology,Zibo 255000,China)
出处
《科学技术与工程》
北大核心
2022年第30期13370-13377,共8页
Science Technology and Engineering
基金
国家重点研发项目(2018YFB1403302)。