The meaning of a word includes a conceptual meaning and a distributive meaning.Word embedding based on distribution suffers from insufficient conceptual semantic representation caused by data sparsity,especially for l...The meaning of a word includes a conceptual meaning and a distributive meaning.Word embedding based on distribution suffers from insufficient conceptual semantic representation caused by data sparsity,especially for low-frequency words.In knowledge bases,manually annotated semantic knowledge is stable and the essential attributes of words are accurately denoted.In this paper,we propose a Conceptual Semantics Enhanced Word Representation(CEWR)model,computing the synset embedding and hypernym embedding of Chinese words based on the Tongyici Cilin thesaurus,and aggregating it with distributed word representation to have both distributed information and the conceptual meaning encoded in the representation of words.We evaluate the CEWR model on two tasks:word similarity computation and short text classification.The Spearman correlation between model results and human judgement are improved to 64.71%,81.84%,and 85.16%on Wordsim297,MC30,and RG65,respectively.Moreover,CEWR improves the F1 score by 3%in the short text classification task.The experimental results show that CEWR can represent words in a more informative approach than distributed word embedding.This proves that conceptual semantics,especially hypernymous information,is a good complement to distributed word representation.展开更多
【目的/意义】上下位关系描述概念之间的"is-a"关系,是分类法、本体和知识图等的重要基石,且在自然语言处理中也有广泛的应用。本文将对从文本语料中识别上下位关系的研究进展、相关资源及应用情况进行分析,为相关领域人员提...【目的/意义】上下位关系描述概念之间的"is-a"关系,是分类法、本体和知识图等的重要基石,且在自然语言处理中也有广泛的应用。本文将对从文本语料中识别上下位关系的研究进展、相关资源及应用情况进行分析,为相关领域人员提供参考。【方法/过程】本文采用内容分析法,以Web of science、维普和中国知网为信息源对其中刊载的上下位关系识别相关研究成果进行了梳理与分析。【结果/结论】上下位关系识别取得了一定的成果,但远未解决,对此还需要进一步的探索和研究。最后从研究方法、基准与评估、领域知识、语言以及应用5个方面对上下位关系识别研究给出了建议。展开更多
基金This research is supported by the National Science Foundation of China(grant 61772278,author:Qu,W.grant number:61472191,author:Zhou,J.http://www.nsfc.gov.cn/)+2 种基金the National Social Science Foundation of China(grant number:18BYY127,author:Li B.http://www.cssn.cn)the Philosophy and Social Science Foundation of Jiangsu Higher Institution(grant number:2019SJA0220,author:Wei,T.https://jyt.jiangsu.gov.cn)Jiangsu Higher Institutions’Excellent Innovative Team for Philosophy and Social Science(grant number:2017STD006,author:Gu,W.https://jyt.jiangsu.gov.cn)。
文摘The meaning of a word includes a conceptual meaning and a distributive meaning.Word embedding based on distribution suffers from insufficient conceptual semantic representation caused by data sparsity,especially for low-frequency words.In knowledge bases,manually annotated semantic knowledge is stable and the essential attributes of words are accurately denoted.In this paper,we propose a Conceptual Semantics Enhanced Word Representation(CEWR)model,computing the synset embedding and hypernym embedding of Chinese words based on the Tongyici Cilin thesaurus,and aggregating it with distributed word representation to have both distributed information and the conceptual meaning encoded in the representation of words.We evaluate the CEWR model on two tasks:word similarity computation and short text classification.The Spearman correlation between model results and human judgement are improved to 64.71%,81.84%,and 85.16%on Wordsim297,MC30,and RG65,respectively.Moreover,CEWR improves the F1 score by 3%in the short text classification task.The experimental results show that CEWR can represent words in a more informative approach than distributed word embedding.This proves that conceptual semantics,especially hypernymous information,is a good complement to distributed word representation.
文摘【目的/意义】上下位关系描述概念之间的"is-a"关系,是分类法、本体和知识图等的重要基石,且在自然语言处理中也有广泛的应用。本文将对从文本语料中识别上下位关系的研究进展、相关资源及应用情况进行分析,为相关领域人员提供参考。【方法/过程】本文采用内容分析法,以Web of science、维普和中国知网为信息源对其中刊载的上下位关系识别相关研究成果进行了梳理与分析。【结果/结论】上下位关系识别取得了一定的成果,但远未解决,对此还需要进一步的探索和研究。最后从研究方法、基准与评估、领域知识、语言以及应用5个方面对上下位关系识别研究给出了建议。