期刊文献+

基于BERT和对抗训练的食品领域命名实体识别 被引量:15

Named Entity Recognition in Food Field Based on BERT and Adversarial Training
下载PDF
导出
摘要 为了在食品领域从非结构化语料中抽取出有效的实体信息,提出了一种基于BERT(Bidirectional Encoder Representations from Transformers)和对抗训练的命名实体识别(Named Entity Recognition,NER)的方法。命名实体识别是一种典型的序列标注问题。目前,深度学习方法已经被广泛应用于该任务并取得了显著的成果,但食品领域等特定领域中的命名实体识别存在难以构建大量样本集、专用名词边界识别不准确等问题。针对这些问题,文中利用BERT得到字向量,以丰富语义的表示;并引入对抗训练,在有效防止中文分词任务私有信息的噪声的基础上,利用中文分词(Chinese Word Segmentation,CWS)和命名实体识别的共享信息来提高识别实体边界的精确率。在两类领域的语料上进行实验,这两类领域分别是中文食品安全案例和人民日报新闻。其中,中文食品安全案例用于训练命名实体识别任务,人民日报新闻用于训练中文分词任务。使用对抗训练来提高命名实体识别任务中实体(包括人名、地名、机构名、食品名称、添加剂名称)识别的精确度,实验结果表明,所提方法的精确率、召回率和F1值分别为95.46%,89.50%,92.38%,因此在食品领域边界不显著的中文命名实体识别任务上,该方法的了F1值得到提升。 Aiming at extracting effective entity information from unstructured corpus in the field of food safety,a named entity recognition(NER)method based on BERT(Bidirectional Encoder Representations from Transformers)and adversarial training is proposed.NER is a typical sequence labeling problem.At present,deep learning methods have been widely used in this task and have achieved remarkable results.However,there are problems such as difficulty in constructing a large number of sample sets for NER in specific fields like the food field,and inaccurate recognition of proper noun boundaries.To solve these problems,BERT is used to get the word vector,which enriches the semantic representation.To optimize the NER task,adversarial training is introduced,which not only uses the shared information obtained from task training of Chinese word segmentation(CWS)and NER,but also prevents the private information of CWS task from generating noise.The experiment is based on the corpus of two categories,which are Chinese food safety cases and People’s Daily news respectively.Among them,the Chinese food safety cases data set is used to train the NER task,and the“People’s Daily”news data set is used to train the CWS task.We use adversarial training to improve the precision of the NER task for entity recognition(including name,location,organization,food name and additive).Experimental results show that the proposed method’s Precision rate,Recall rate and F1 score are 95.46%,89.50%and 92.38%respectively.Therefore,this method has a high F1 score for Chinese NER task,where the boundary of a specific domain is indistinct.
作者 董哲 邵若琦 陈玉梁 翟维枫 DONG Zhe;SHAO Ruo-qi;CHEN Yu-liang;ZHAI Wei-feng(School of Electrical and Control Engineering,North China University of Technology,Beijing 100144,China)
出处 《计算机科学》 CSCD 北大核心 2021年第5期247-253,共7页 Computer Science
基金 国家重点研发计划课题(2018YFC1602703) 国家自然科学基金(61873006)。
关键词 食品领域 命名实体识别 BERT BiLSTM 对抗训练 Food field Named entity recognition BERT BiLSTM Adversarial training
  • 相关文献

参考文献6

二级参考文献30

共引文献80

同被引文献169

引证文献15

二级引证文献67

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部