摘要
跨境民族文化领域文本中存在较多的领域词汇,使得模型提取领域信息困难,造成上下文领域信息缺失,在该领域中实体密度分布高,面临实体关系重叠的问题。考虑到领域信息对跨境民族文化文本语义表征有着重要的作用,该文提出一种基于指针标注的跨境民族文化实体关系抽取方法,在字符向量表示中融入领域词典信息来增强领域信息用于解决领域实体标注不准确问题,通过多层指针标注解决跨境民族文化领域实体关系重叠问题。实验结果表明,在跨境民族文化实体关系抽取数据集上所提出方法相比于基线方法的F_(1)值提升了2.34%。
The information extraction in the field of cross-border ethnic culture is challenged by rich domain words and the high density distribution of entities caused the overlapping entity relationships.To better capture the domain information,this paper proposes a cross-border ethnic cultural entity relationship extraction method based on pointer annotation.The domain lexicon is integrated into the character vector representation to enhance domain entity labeling.The problem of overlapping entity relations is solved through multi-layer pointer labeling in the field of cross-border ethnic culture.The experimental results show that the F_(1)value of the proposed method has improved by 2.34%compared with the baseline method on the cross-border ethnic cultural entity relation extraction dataset.
作者
杨振平
毛存礼
雷雄丽
黄于欣
张勇丙
YANG Zhenping;MAO Cunli;LEI Xiongli;HUANG Yuxin;ZHANG Yongbing(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming,Yunnan 650500,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming,Yunnan 650500,China;Kunming Metallurgical College,Kunming,Yunnan 650500,China)
出处
《中文信息学报》
CSCD
北大核心
2024年第3期75-83,共9页
Journal of Chinese Information Processing
基金
国家自然科学基金(62166023,61866019)
云南省自然科学基金(2019FA023)
云南省重大科技专项计划项目(202103AA080015,202002AD080001)。
关键词
跨境民族文化
实体关系抽取
指针标注
领域词典信息
cross-border national culture
entity relation extraction
pointer annotation
domain lexicon information