现今Web中存在大量缺失、不一致及不精确的数据,而传统的搜索引擎只能根据关键词返回文档片段,无法直接获取目标实体。提出一种新的基于图匹配的实体抽取算法GMEE(Graph Matching Based Entity Extraction),首先将片段按词语分割,进行...现今Web中存在大量缺失、不一致及不精确的数据,而传统的搜索引擎只能根据关键词返回文档片段,无法直接获取目标实体。提出一种新的基于图匹配的实体抽取算法GMEE(Graph Matching Based Entity Extraction),首先将片段按词语分割,进行实体的初步筛选;然后根据各实体之间的结构和语义关系建立“加权语义实体关联图”;最后利用“最大公共子图匹配”策略抽取目标实体。实验结果表明,提出的算法在不需要大量参数训练及传递的情况下,能够对抽取的实体集进行有效的精简,既保证了召回率、准确率,又提高了抽取过程的可解释性。展开更多
Entity linking(EL)is the task of determining the identity of textual entity mentions given a predefined knowledge base(KB).Plenty of existing efforts have been made on this task using either"local"informatio...Entity linking(EL)is the task of determining the identity of textual entity mentions given a predefined knowledge base(KB).Plenty of existing efforts have been made on this task using either"local"information(contextual information of the mention in the text),or"global"information(relations among candidate entities).However,either local or global information might be insufficient especially when the given text is short.To get richer local and global information for entity linking,we propose to enrich the context information for mentions by getting extra contexts from the web through web search engines(WSE).Based on the intuition above,two novel attempts are made.The first one adds web-searched results into an embedding-based method to expand the mention's local information,where we try two different methods to help generate high-quality web contexts:one is to apply the attention mechanism and the other is to use the abstract extraction method.The second one uses the web contexts to extend the global information,i.e.,finding and utilizing more extra relevant mentions from the web contexts with a graph-based model.Finally,we combine the two models we propose to use both extended local and global information from the extra web contexts.Our empirical study based on six real-world datasets shows that using extra web contexts to extend the local and the global information could effectively improve the F1 score of entity linking.展开更多
A web-based translation method for Chinese organization name is proposed.After ana-lyzing the structure of Chinese organization name,the methods of bilingual query formulation and maximum entropy based translation re-...A web-based translation method for Chinese organization name is proposed.After ana-lyzing the structure of Chinese organization name,the methods of bilingual query formulation and maximum entropy based translation re-ranking are suggested to retrieve the English translation from the web via public search engine.The experiments on Chinese university names demonstrate the validness of this approach.展开更多
文摘现今Web中存在大量缺失、不一致及不精确的数据,而传统的搜索引擎只能根据关键词返回文档片段,无法直接获取目标实体。提出一种新的基于图匹配的实体抽取算法GMEE(Graph Matching Based Entity Extraction),首先将片段按词语分割,进行实体的初步筛选;然后根据各实体之间的结构和语义关系建立“加权语义实体关联图”;最后利用“最大公共子图匹配”策略抽取目标实体。实验结果表明,提出的算法在不需要大量参数训练及传递的情况下,能够对抽取的实体集进行有效的精简,既保证了召回率、准确率,又提高了抽取过程的可解释性。
基金supported by the National Key Research and Development Program of China under Grant No.2018AAAO10190the Natural Science Foundation of Jiangsu Province of China under Grant No.BK20191420+2 种基金the National Natural Science Foundation of China under Grant No.61632016the Natural Science Research Project of Jiangsu Higher Education Institution under Grant No.17KJA520003the Priority Academic Program Development of JiangsuHigher Education Institutions,and the Suda-Toycloud Data Intelligence Joint Laboratory.
文摘Entity linking(EL)is the task of determining the identity of textual entity mentions given a predefined knowledge base(KB).Plenty of existing efforts have been made on this task using either"local"information(contextual information of the mention in the text),or"global"information(relations among candidate entities).However,either local or global information might be insufficient especially when the given text is short.To get richer local and global information for entity linking,we propose to enrich the context information for mentions by getting extra contexts from the web through web search engines(WSE).Based on the intuition above,two novel attempts are made.The first one adds web-searched results into an embedding-based method to expand the mention's local information,where we try two different methods to help generate high-quality web contexts:one is to apply the attention mechanism and the other is to use the abstract extraction method.The second one uses the web contexts to extend the global information,i.e.,finding and utilizing more extra relevant mentions from the web contexts with a graph-based model.Finally,we combine the two models we propose to use both extended local and global information from the extra web contexts.Our empirical study based on six real-world datasets shows that using extra web contexts to extend the local and the global information could effectively improve the F1 score of entity linking.
基金Supported by National Natural Science Foundation of China (No.60736044 & 60773066)the Post Doctorial Funds of Heilongjiang
文摘A web-based translation method for Chinese organization name is proposed.After ana-lyzing the structure of Chinese organization name,the methods of bilingual query formulation and maximum entropy based translation re-ranking are suggested to retrieve the English translation from the web via public search engine.The experiments on Chinese university names demonstrate the validness of this approach.