摘要
命名实体识别(Name Entity Recognition,NER)是文本信息抽取的关键步骤之一。近年来,结合词汇信息与字符信息的命名实体识别方法表现优异,引起了众多学者的广泛注意。然而目前的字词融合策略还存在可迁移性差、词汇信息遗失、难以明确词汇边界信息等问题。基于此,提出一种动态字词信息融合的中文命名实体识别方法,首先利用多头自注意力机制动态融合对应位置的词汇信息与字符信息形成词汇集合信息,其次动态融合词汇集合信息确定对应的词汇边界,利用词汇向量优化字符向量表示,最后利用BiLSTM-CRF完成序列解码识别命名实体。实验证明,在MSRA、ONTO、WEIBO等3个公开数据集上,中文命名实体识别方法较之字粒度的命名实体识别方法性能大幅提升,同时可与BERT等预训练模型有效结合,具有良好的可迁移性、可完整而动态的融合词汇信息与字符信息提升命名实体识别性能。
Name Entity Recognition( NER) is one of the key steps in text information extraction.In recent years,the named entity recognition method combining vocabulary information and character information has performed well,it has attracted the attention of many scholars However,current word fusion strategies still have problems such as poor transferability,loss of vocabulary information,and difficulty in clarifying vocabulary boundary information.Based on this,this paper proposes a Chinese Name Entity Recognition based on Dynamic Fusing information of Word and Char,which first uses the multi-head self-attention mechanism to dynamically fuse the corresponding vocabulary The information and character information form the vocabulary set information.Secondly,the vocabulary set information is dynamically merged to determine the corresponding vocabulary boundary,the vocabulary vector is used to optimize the character vector representation,and finally the BiLSTMCRF is used to complete the sequence decoding to identify the named entity.Experiments have proved that on the three public data sets of MSRA,ONTO,and WEIBO,chinese name entity recognition based on dynamic fusing information of word on char has greatly improved the performance of the ZigZag-granular named entity recognition method,and can be effectively combined with pretraining models such as BERT.It has good transferability,and it can complete and dynamic fusion of vocabulary information and character information to improve named entity recognition performance.
作者
胡楠
黄瑞阳
张建朋
余诗媛
苏珂
HU Nan;HUANG Ruiyang;ZHANG Jianpeng;YU Shiyuan;SU Ke(School of Cyber Science and Engineering,Zhengzhou University,Zhengzhou 450000,China;National Digital Switching System Engineering and Technology Research Center,Zhengzhou 450000,China)
出处
《信息工程大学学报》
2022年第4期452-459,共8页
Journal of Information Engineering University
基金
国家自然基金青年基金资助项目(62002384)
中国博士后科学基金面上项目(47698)。
关键词
信息抽取
命名实体识别
动态特征融合
字词融合
多头自注意力机制
information extraction
name entity recognition
dynamic feature fusion
word and word fusion
multi-head self-attention