Entity linking(EL)is a fundamental task in natural language processing.Based on neural networks,existing systems pay more attention to the construction of the global model,but ignore latent semantic information in the...Entity linking(EL)is a fundamental task in natural language processing.Based on neural networks,existing systems pay more attention to the construction of the global model,but ignore latent semantic information in the local model and the acquisition of effective entity type information.In this paper,we propose two adaptive features,in which the first adaptive feature enables the local and global models to capture latent information,and the second adaptive feature describes effective information for entity type embeddings.These adaptive features can work together naturally to handle some uncertain entity type information for EL.Experimental results demonstrate that our EL system achieves the best performance on the AIDA-B and MSNBC datasets,and the best average performance on out-domain datasets.These results indicate that the proposed adaptive features,which are based on their own diverse contexts,can capture information that is conducive for EL.展开更多
Entity set expansion(ESE)aims to expand an entity seed set to obtain more entities which have common properties.ESE is important for many applications such as dictionary con-struction and query suggestion.Traditional ...Entity set expansion(ESE)aims to expand an entity seed set to obtain more entities which have common properties.ESE is important for many applications such as dictionary con-struction and query suggestion.Traditional ESE methods relied heavily on the text and Web information of entities.Recently,some ESE methods employed knowledge graphs(KGs)to extend entities.However,they failed to effectively and fficiently utilize the rich semantics contained in a KG and ignored the text information of entities in Wikipedia.In this paper,we model a KG as a heterogeneous information network(HIN)containing multiple types of objects and relations.Fine-grained multi-type meta paths are proposed to capture the hidden relation among seed entities in a KG and thus to retrieve candidate entities.Then we rank the entities according to the meta path based structural similarity.Furthermore,to utilize the text description of entities in Wikipedia,we propose an extended model CoMeSE++which combines both structural information revealed by a KG and text information in Wikipedia for ESE.Extensive experiments on real-world datasets demonstrate that our model achieves better performance by combining structural and textual information of entities.展开更多
细粒度实体分类(Fine-grained entity type classification,FETC)旨在将文本中出现的实体映射到层次化的细分实体类别中.近年来,采用深度神经网络实现实体分类取得了很大进展.但是,训练一个具备精准识别度的神经网络模型需要足够数量的...细粒度实体分类(Fine-grained entity type classification,FETC)旨在将文本中出现的实体映射到层次化的细分实体类别中.近年来,采用深度神经网络实现实体分类取得了很大进展.但是,训练一个具备精准识别度的神经网络模型需要足够数量的标注数据,而细粒度实体分类的标注语料非常稀少,如何在没有标注语料的领域进行实体分类成为难题.针对缺少标注语料的实体分类任务,本文提出了一种基于迁移学习的细粒度实体分类方法,首先通过构建一个映射关系模型挖掘有标注语料的实体类别与无标注语料实体类别间的语义关系,对无标注语料的每个实体类别,构建其对应的有标注语料的类别映射集合.然后,构建双向长短期记忆(Bidirectional long short term memory,BiLSTM)模型,将代表映射类别集的句子向量组合作为模型的输入用来训练无标注实体类别.基于映射类别集中不同类别与对应的无标注类别的语义距离构建注意力机制,从而实现实体分类器以识别未知实体分类.实验证明,我们的方法取得了较好的效果,达到了在无任何标注语料前提下识别未知命名实体分类的目的.展开更多
基金Project supported by the Key-Area Research and Development Program of Guangdong Province,China(No.2019B010153002)the Program of Marine Economy Development(Six Marine Industries)Special Foundation of Department of Natural Resources of Guangdong Province,China(No.GDNRC[2020]056)+2 种基金the National Natural Science Foundation of China(No.62002071)the Top Youth Talent Project of Zhujiang Talent Program,China(No.2019QN01X516)the Guangdong Provincial Key Laboratory of Cyber-Physical System,China(No.2020B1212060069)。
文摘Entity linking(EL)is a fundamental task in natural language processing.Based on neural networks,existing systems pay more attention to the construction of the global model,but ignore latent semantic information in the local model and the acquisition of effective entity type information.In this paper,we propose two adaptive features,in which the first adaptive feature enables the local and global models to capture latent information,and the second adaptive feature describes effective information for entity type embeddings.These adaptive features can work together naturally to handle some uncertain entity type information for EL.Experimental results demonstrate that our EL system achieves the best performance on the AIDA-B and MSNBC datasets,and the best average performance on out-domain datasets.These results indicate that the proposed adaptive features,which are based on their own diverse contexts,can capture information that is conducive for EL.
基金This work was supported by the National Natural Science Foundation of China(Grant Nos.61806020,61772082,61972047,61702296)the National Key Research and Development Program of China(2017YFB0803304)+1 种基金the Beijing Municipal Natural Science Foundation(4182043)the CCF-Tencent Open Fund,and the Fundamental Research Funds for the Central Universities.
文摘Entity set expansion(ESE)aims to expand an entity seed set to obtain more entities which have common properties.ESE is important for many applications such as dictionary con-struction and query suggestion.Traditional ESE methods relied heavily on the text and Web information of entities.Recently,some ESE methods employed knowledge graphs(KGs)to extend entities.However,they failed to effectively and fficiently utilize the rich semantics contained in a KG and ignored the text information of entities in Wikipedia.In this paper,we model a KG as a heterogeneous information network(HIN)containing multiple types of objects and relations.Fine-grained multi-type meta paths are proposed to capture the hidden relation among seed entities in a KG and thus to retrieve candidate entities.Then we rank the entities according to the meta path based structural similarity.Furthermore,to utilize the text description of entities in Wikipedia,we propose an extended model CoMeSE++which combines both structural information revealed by a KG and text information in Wikipedia for ESE.Extensive experiments on real-world datasets demonstrate that our model achieves better performance by combining structural and textual information of entities.
文摘细粒度实体分类(Fine-grained entity type classification,FETC)旨在将文本中出现的实体映射到层次化的细分实体类别中.近年来,采用深度神经网络实现实体分类取得了很大进展.但是,训练一个具备精准识别度的神经网络模型需要足够数量的标注数据,而细粒度实体分类的标注语料非常稀少,如何在没有标注语料的领域进行实体分类成为难题.针对缺少标注语料的实体分类任务,本文提出了一种基于迁移学习的细粒度实体分类方法,首先通过构建一个映射关系模型挖掘有标注语料的实体类别与无标注语料实体类别间的语义关系,对无标注语料的每个实体类别,构建其对应的有标注语料的类别映射集合.然后,构建双向长短期记忆(Bidirectional long short term memory,BiLSTM)模型,将代表映射类别集的句子向量组合作为模型的输入用来训练无标注实体类别.基于映射类别集中不同类别与对应的无标注类别的语义距离构建注意力机制,从而实现实体分类器以识别未知实体分类.实验证明,我们的方法取得了较好的效果,达到了在无任何标注语料前提下识别未知命名实体分类的目的.