摘要
文本聚类作为一种自动化程度较高的无监督机器学习方法,能够实现对文本信息的有效组织、摘要和导航,近年来已经广泛应用在信息检索领域。笔者针对使用向量空间模型进行聚类时对于同义词和多义词的处理存在的缺陷,提出了基于本体的文本聚类模型。首先使用WordNet词典对文档中的词进行语义标注,得到文档的概念集合;然后对每个文档的概念集合进行概念聚类,生成文档的概念主题;最后通过计算主题的相似度完成文本聚类。该模型减少了相似度计算量,改善了聚类结果和聚类性能。
Text clustering as a high degree of automation unsupervised machine learning methods,that can achieve effective organization,summary and navigation of text information.In recent years text clustering hans been widely used in the field of information retrieval.This paper against use the vector space model for clustering for processing defects of synonyms and polysemy,we proposed a new text clustering model based on ontology.First,this method use the WordNet dictionary to semantic annotations words of document,getting the concept of document collection;Then,the concept of each document clustering,achieve the subject of document;Finally through calculate the similarity among subjects.This method reduces the similarity calculation,the model improves the clustering results and performance.
出处
《河北省科学院学报》
CAS
2014年第2期79-82,共4页
Journal of The Hebei Academy of Sciences