摘要
命名实体识别是构建学科知识图谱的重要步骤。近年来,随着深度学习的发展,通用领域、医学等领域命名实体识别的性能得到了很大的提升。Java学科领域知识点繁杂,实体中英文掺杂,并且存在其特有的实体内部特征,因此通用模型在此领域实体识别准确率并不高、不能有效识别实体边界。提出改进的单模结构,在嵌入层融入词边界信息,引入了词性信息和Java领域实体识别的规则信息,以提高模型识别实体边界的准确率。编码层使用BiLSTM和IDCNN进行上下文信息提取,解码层使用CRF得到序列全局最优提取。其次,提出对多个异构单模结果进行融合互补的想法,以提高模型实体识别性能和模型的泛化能力。实验结果显示,基于自主构建的Java领域数据集,新的单模模型相比于主流模型实体识别F1值提高了约2个百分点。多模融合后的实体识别的性能也有明显的提升,表明模型在Java领域命名实体识别任务上有着更好的效果。
Named entity recognition is an important step in constructing disciplinary knowledge map. In recent years, with the development of deep learning, the performance of named entity recognition in general field, medicine and other fields has been greatly improved. The knowledge of Java subject is complicated, the entities are mixed in Chinese and English, and there are unique internal characteristics of the entities. Therefore, the accuracy of entity recognition of the general model in this field is not high, and the entity boundary cannot be effectively identified. In order to improve the accuracy of entity boundary recognition, an improved single-mode structure is proposed, and word boundary information is incorporated into the embedding layer, part of speech information and Java domain entity recognition rule information are introduced. BiLSTM and IDCNN are used in encoding layer to extract context information, and CRF is used in decoding layer to obtain global optimal sequence extraction. Secondly, the idea of fusing and complementing multiple heterogeneous single-mode results is proposed to improve the entity recognition performance and generalization capability of the model. Experimental results show that, based on the self-constructed Java domain data set, the entity recognition F1 value of the new single-mode model is improved by about 2 percentage points compared with the mainstream model. The performance of entity recognition after multi-mode fusion is also significantly improved, indicating that the model has better performance in Java domain named entity recognition task.
出处
《计算机科学与应用》
2022年第12期2712-2724,共13页
Computer Science and Application