期刊文献+

一种基于Lexicon-CBOW命名实体简写识别技术

A Lexicon-CBOW Based Named Entity Abbreviation Recognition Technology
下载PDF
导出
摘要 中文命名实体识别任务是指识别文本中具有特定意义的实体,一般由词向量层、特征提取层、输出层组成。论文考虑词向量的训练方法。目前广泛应用的CBOW、Skip-gram词向量训练模型是利用给定的词预测目标词出现的概率。由于语料库一般来源于百度百科、微博等结构化网站,其实体表达较为规范,导致其训练的词向量在表示简写实体时存在较大误差,从而影响命名实体识别精度。论文在给定词预测目标词的基础上,引入了实体标签信息,对有标签的字进一步进行分词细化标签,带标签的字在作为上下文时会进行全标签遮掩操作与部分标签遮掩操作,从而模拟简写的实际情况。将训练得到的词向量在简写实体较多的高中升学规划问题数据集上进行测试,实验结果显示简写实体识别准确率有较大提升,证明了模型针对简写实体表示的有效性。 Chinese named entity recognition task refers to the recognition of entities with specific meaning in the text,which is generally composed of word vector layer,feature extraction layer and output layer.In this paper,the training method of word vec-tor is considered.Recently,model such as CBOW and SKip-Gram training word vector is to use the given word to predict the occur-rence probability of the target word.It is noted that the corpus is generally derived from structured websites such as Baidu Baike and Weibo,and its entity expression is relatively standard,which leads to the large error of the trained word vectors in the representa-tion of abbreviated entities,affecting the accuracy of named entity recognition.In this paper,entity label information is introduced on the basis of the given word to predict the target word,and the tagged word is further segmented to refine the label.The tagged word will be covered by full label and partially by label when used as the context,so as to simulate the actual situation of abbrevia-tion.The trained word vectors are tested on the data set of high school admission planning problems with more abbreviated entities,and the experimental results showes that the recognition accuracy of the abbreviated entities is greatly improved,which proves the validity of the model for the representation of the abbreviated entities.
作者 吴健 朱小龙 周从华 WU Jian;ZHU Xiaolong;ZHOU Conghua(School of Computer Science and Telecommunication Engineering,Jiangsu University,Zhenjiang 212013)
出处 《计算机与数字工程》 2023年第6期1328-1332,1386,共6页 Computer & Digital Engineering
关键词 实体简写 命名实体识别 实体标签 词向量 entity shorthand named entity recognition entity flag word vector
  • 相关文献

参考文献6

二级参考文献64

共引文献237

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部