摘要
针对传统简单距离分类方法的特征选择未考虑到不同抽象层次上的词汇语义差异,提出了一种基于本体语义的简单向量距离分类方法,在本体库的支持下有效地将语言学知识融合到文本向量空间的表示中,进一步挖掘出特征项概念间的深层语义联系,用得到的语义特征向量作为最终的文本特征向量。同时定义了基于领域本体计算不同抽象层上的语义相似度,并将其应用到简单向量距离分类算法中。在数据集CWT20G上的实验表明:基于本体语义的简单距离分类算法对同义词、多义词、上下位词区分能力更强;并且分类准确率随着语义分析的深入逐步提高。
The feature selection of traditional simple vector distance ignores the semantic difference of vocabulary on different abstract levels.Aimed at this problem,this paper proposed semantic simple vector distance classification based on ontology.It efficiently incorporates linguistic knowledge into text vector space representation with the support of ontology and further discover the deep-seated semantic relations among concepts of feature vector.Then those semantic feature vectors are used as final text feature vectors.At the same time,this approach defines how to calculate the semantic similarity of different abstract levels based on domain ontologies,and then the semantic similarity is used to improve the traditional simple vector distance method.Experiments on corpus CWT20G show that ontology semantic simple vector distance algorithm distinguishs better for synonym,polysemy and hyponymy.The accuracy rate of classification is gradually improved along with more and more in-depth semantic analysis.
出处
《北京石油化工学院学报》
2007年第3期13-17,共5页
Journal of Beijing Institute of Petrochemical Technology
基金
北京市教育委员会基金资助项目
项目号:KM200610017007