摘要
英文词汇蕴涵关系识别已有较多研究,并提出许多识别模型,但针对中文的词汇蕴涵关系获取则鲜有研究。为此,提出一种中文词汇蕴涵关系识别方法。利用词向量技术,在中文维基百科语料上进行训练,将词汇表示为词向量,设计各种基于词向量的分类特征,训练得到可用于名词词汇蕴涵关系分类的支持向量机分类模型。实验结果表明,与传统的余弦相似度方法相比,该方法以及设计的各种分类特征在词汇蕴涵关系识别方面具有明显优势。
Automatic recognition of English lexical entailment relation has many researches,and many recognition models are presented. But study on Chines lexical entailment is not sufficient while there have many studies on English lexical entailment from different points of view. This paper proposes a recognition method of Chinese lexical entailment relation based on word vector,it uses word vector technology on Chinese Wikipedia corpora,and word is represented as word vector. Word vector based classification features are designed,and Support Vector Machine( SVM) model for Chinese noun lexical entailment classification is trained on manually created Chinese lexical entailment data set.Experimental results show that the method and designed classification features have good performance on lexical entailment relation recognition compared with traditional cosine similarity method.
出处
《计算机工程》
CAS
CSCD
北大核心
2016年第2期169-174,共6页
Computer Engineering
基金
国家自然科学基金资助项目(61163039
61163036
61363058)
西北师范大学青年教师科研能力提升计划基金资助项目(NWNU-LKQN-10-2
NWNU-LKQN-12-23)
关键词
文本蕴涵
词汇蕴涵
词向量
蕴涵特征
支持向量机
textual entailment
lexical entailment
word vector
entailment feature
Support Vector Machine(SVM)