摘要
针对电机领域命名实体识别困难、精度不高的问题,提出了一种基于BERT和多窗口门控CNN的电机领域命名实体识别模型。该模型首先利用BERT预训练模型生成句子的字向量序列,根据电机领域文本的上下文动态微调字向量,增强字向量的语义表达;其次,构建具有全局时序特征感知单元和多窗口门控CNN单元的双分支特征提取层,形成句子的多级语义特征表示;最后,通过CRF对字符序列进行解码,得到每个字符对应的标签。在小规模的自建电机领域数据集与多组模型进行的对比实验结果表明,该模型命名实体识别性能均优于其他模型,macro-F_(1)值达到了90.16%,验证了该方法对电机领域实体识别的有效性。
Aiming at the problems of difficult and low-accuracy named entity recognition in the motor field,this paper proposed a named entity recognition model in the motor field based on BERT and multi-window gated CNN.Firstly,the model used the BERT pre-training model to generate the character vector sequence of the sentence,and dynamically fine-tuned the character vector according to the context of the text in the motor field to enhance the semantic expression of the character vector;Secondly,it constructed a double-branch feature extraction layer with a global time sequential feature perception unit and a multi-window gated CNN unit to form a multi-level semantic feature representation of sentence;Finally,it used CRF to decode the character sequence to obtain the corresponding label of each character.The results of comparative experiments with multiple models on the small-scale self-built motor field data set show that the named entity recognition performance of the model is better than other models,with macro-F_(1)values reaching 90.16%,which verifies the effectiveness of entity recognition in motor field.
作者
张智源
孙水华
徐诗傲
徐凡
刘建华
Zhang Zhiyuan;Sun Shuihua;Xu Shi’ao;Xu Fan;Liu Jianhua(Nanping Electric Power Supply Company,Nanping Fujian 353000,China;College of Computer Science&Mathematics,Fujian University of Technology,Fuzhou 350118,China)
出处
《计算机应用研究》
CSCD
北大核心
2023年第1期107-114,共8页
Application Research of Computers
基金
福建省自然科学基金资助项目(2019J01061137)
福建工程学院发展基金资助项目(GY-Z20046)。