期刊文献+

利用本体技术的文本聚类模型

Text clustering model based on ontology
下载PDF
导出
摘要 文本聚类作为一种自动化程度较高的无监督机器学习方法,能够实现对文本信息的有效组织、摘要和导航,近年来已经广泛应用在信息检索领域。笔者针对使用向量空间模型进行聚类时对于同义词和多义词的处理存在的缺陷,提出了基于本体的文本聚类模型。首先使用WordNet词典对文档中的词进行语义标注,得到文档的概念集合;然后对每个文档的概念集合进行概念聚类,生成文档的概念主题;最后通过计算主题的相似度完成文本聚类。该模型减少了相似度计算量,改善了聚类结果和聚类性能。 Text clustering as a high degree of automation unsupervised machine learning methods,that can achieve effective organization,summary and navigation of text information.In recent years text clustering hans been widely used in the field of information retrieval.This paper against use the vector space model for clustering for processing defects of synonyms and polysemy,we proposed a new text clustering model based on ontology.First,this method use the WordNet dictionary to semantic annotations words of document,getting the concept of document collection;Then,the concept of each document clustering,achieve the subject of document;Finally through calculate the similarity among subjects.This method reduces the similarity calculation,the model improves the clustering results and performance.
出处 《河北省科学院学报》 CAS 2014年第2期79-82,共4页 Journal of The Hebei Academy of Sciences
关键词 本体 文本聚类 概念主题 WORDNET Ontology Text clustering The subject of document WordNet
  • 相关文献

参考文献9

  • 1Yao M,Pi D,Cong X.Chinese text clustering algorithm based k-means[J].Physics Procedia,2012,33:301-307. 被引量:1
  • 2Kang B Y,Lee S J.Document indexing:a concept-based approach to term weight estimation[J].Information processing&management,2005,41(5):1065-1080. 被引量:1
  • 3Sánchez D,Batet M,Isern D,et al.Ontology-based semantic similarity:A new feature-based approach[J].Expert Systems with Applications,2012,39(9):7718-7728. 被引量:1
  • 4Neic′S,Crestani F,Jazayeri M,et al.Concept-based semantic annotation,indexing and retrieval of office-like document units[C]//Adaptivity,Personalization and Fusion of Heterogeneous Information.LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE,2010:134-135. 被引量:1
  • 5王刚,钟国祥.一种基于本体相似度计算的文本聚类算法研究[J].计算机科学,2010,37(9):222-224. 被引量:10
  • 6郑晓洁,张琳.本体映射中相似度计算的改进[J].计算机科学,2013,40(12):108-112. 被引量:6
  • 7王刚,邱玉辉.基于本体及相似度的文本聚类研究[J].计算机应用研究,2010,27(7):2494-2497. 被引量:9
  • 8Boubekeur F,Boughanem M,Tamine L,et al.Using WordNet for Concept-based document indexing in information retrieval[C]//SEMAPRO 2010,The Fourth International Conference on Advances in Semantic Processing.2010:151-157. 被引量:1
  • 9Budanitsky A,Hirst G.Semantic distance in WordNet:An experimental,application-oriented evaluation of five measures[C]//Workshop on WordNet and Other Lexical Resources.2001,2. 被引量:1

二级参考文献21

  • 1薛为民,陆玉昌.文本挖掘技术研究[J].北京联合大学学报,2005,19(4):59-63. 被引量:63
  • 2张承立,陈剑波,齐开悦.基于语义网的语义相似度算法改进[J].计算机工程与应用,2006,42(17):165-166. 被引量:38
  • 3孙爽,章勇.一种基于语义相似度的文本聚类算法[J].南京航空航天大学学报,2006,38(6):712-716. 被引量:18
  • 4SONG Shao-xu,LI Chun-ping.TCUAP:a novel approach of text clustering using asymmetric proximity[C] //Proc of IICAI.2005:676-685. 被引量:1
  • 5WEINSTEIN P,BIRMINGHAM W.Comparing concepts in differentiated ontologies[C] //Proc of KAW-99.1999. 被引量:1
  • 6WACHE H,VOGELE T,VISSER U,et al.Ontology based integration of information:a survey of existing approaches[C] //Proc of the IJCAI-01 Workshop on Ontologies and Information Sharing.New York:IEEE Press,2001:108-117. 被引量:1
  • 7FRIDMANNOY N,MUSEN M.PROMPT:algorithm and tool for automated ontology merging and alignment[C] //Proc of AAAI-2000.Austin,Texas:MIT Press/AAAI Press,2000:450-455. 被引量:1
  • 8PANDYA A,BHATTACHARYYA P.Text similarity measurement using concept representation of texts[C] //Proc of the 1st International Conference on Patttern Recognition and Machine Intelligence.Berlin,Germany:Springer,2005:678-683. 被引量:1
  • 9Weinstein P,Birmingham W.Comparing concepts in differentiated ontologie[C] ∥Proc.of KAW-99.1999. 被引量:1
  • 10Paolucci M.Semantic Matching of Web Service Capabilities[C] ∥Proceedings of the First International Semantic Web Conference(ISWC).2002. 被引量:1

共引文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部