摘要
实体消歧作为自然语言处理的关键问题,旨在将文本中出现的歧义实体指称映射到知识库中的目标实体。针对现有方法存在仅实现单实体指称消歧、忽略了实体影响力及候选实体间相似度对消歧结果的影响以及冗余图节点增加图计算复杂性等问题,提出了一种融合多特征图及实体影响力的领域实体消歧方法,以金融领域为例,提取CN-Dbpedia中金融类别相关关键词三元组,构建金融领域知识库;针对金融活动类文本,提取待消歧实体指称,融合字符串及语义的相似特征,筛选出候选实体,利用知识库三元组信息获取候选实体间2-hop内的关系,同时计算候选实体间相似度作为边权值,进而将多特征信息充分融合到图模型当中,完成多特征图构建;采用动态决策策略,利用PageRank算法,并结合实体影响力计算多特征图中候选实体的综合评分,进而获得可信度较高的消歧结果。实验结果验证了提出方法在特定领域实体消歧的精确度及效率。
Entity disambiguation is a key problem in natural language processing, aims to map ambiguous mentions in texts to target entities in the knowledge base. Existing approaches have several problems, such as only realizing single mention disambiguation, ignoring the influence of entity impact and similarity between candidate entities on disambiguation results, and increasing the computational complexity by redundant graph nodes. A domain entity disambiguation method combining multi-feature graph and entity influence is proposed. Taking the financial domain as an example, the financial domain knowledge base is constructed by extracting the keyword triads related to financial categories from CN-DBpedia.Then, it extracts mentions from financial activities, and screens out candidate entities fusing the similar features of string and semantic. It uses triples of the knowledge base to acquire relationship between entities within 2-hop, at the same time calculates similarity between candidate entities as edge weights. The multi-features are fully integrated into the graph model to finish the multi-feature graph construction. Finally, it adopts dynamic decision strategy, PageRank algorithm and entity influence are used to calculate the comprehensive score of candidate entities in the multi-features graph. And then the disambiguation results with high reliability are obtained. Experimental results verify the accuracy and efficiency of the proposed method in the specific domain.
作者
单晓欢
齐鑫傲
宋宝燕
张浩林
SHAN Xiaohuan;QI Xin’ao;SONG Baoyan;ZHANG Haolin(College of Information,Liaoning University,Shenyang 110036,China)
出处
《计算机工程与应用》
CSCD
北大核心
2023年第5期305-311,共7页
Computer Engineering and Applications
基金
国家重点研发计划项目(2019YFC0850103)
国家自然科学基金(61472169,61502215,62072220)
国家自然科学基金联合项目(U1811261)
中国博士后基金面上项目(2020M672134)
辽宁省教育厅科学研究项目(LJC201913)
辽宁省公共舆情与网络安全大数据系统工程实验室(04-2016-0089013)。
关键词
领域实体消歧
实体链接
多特征图
实体影响力
知识库
domain entity disambiguation
entity linking
multi-feature graph
entity influence
knowledge base