利用本体技术的文本聚类模型

Text clustering model based on ontology

下载PDF

导出

摘要文本聚类作为一种自动化程度较高的无监督机器学习方法,能够实现对文本信息的有效组织、摘要和导航,近年来已经广泛应用在信息检索领域。笔者针对使用向量空间模型进行聚类时对于同义词和多义词的处理存在的缺陷,提出了基于本体的文本聚类模型。首先使用WordNet词典对文档中的词进行语义标注,得到文档的概念集合;然后对每个文档的概念集合进行概念聚类,生成文档的概念主题;最后通过计算主题的相似度完成文本聚类。该模型减少了相似度计算量,改善了聚类结果和聚类性能。 Text clustering as a high degree of automation unsupervised machine learning methods,that can achieve effective organization,summary and navigation of text information.In recent years text clustering hans been widely used in the field of information retrieval.This paper against use the vector space model for clustering for processing defects of synonyms and polysemy,we proposed a new text clustering model based on ontology.First,this method use the WordNet dictionary to semantic annotations words of document,getting the concept of document collection;Then,the concept of each document clustering,achieve the subject of document;Finally through calculate the similarity among subjects.This method reduces the similarity calculation,the model improves the clustering results and performance.

作者李少博邸书灵范通让

机构地区石家庄铁道大学信息科学与技术学院

出处《河北省科学院学报》 CAS 2014年第2期79-82,共4页 Journal of The Hebei Academy of Sciences

关键词本体文本聚类概念主题 WORDNET Ontology Text clustering The subject of document WordNet

分类号 TP391.12 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献9

1Yao M,Pi D,Cong X.Chinese text clustering algorithm based k-means[J].Physics Procedia,2012,33:301-307. 被引量：1
2Kang B Y,Lee S J.Document indexing:a concept-based approach to term weight estimation[J].Information processing&management,2005,41(5):1065-1080. 被引量：1
3Sánchez D,Batet M,Isern D,et al.Ontology-based semantic similarity:A new feature-based approach[J].Expert Systems with Applications,2012,39(9):7718-7728. 被引量：1
4Neic′S,Crestani F,Jazayeri M,et al.Concept-based semantic annotation,indexing and retrieval of office-like document units[C]//Adaptivity,Personalization and Fusion of Heterogeneous Information.LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE,2010:134-135. 被引量：1
5王刚,钟国祥.一种基于本体相似度计算的文本聚类算法研究[J].计算机科学,2010,37(9):222-224. 被引量：10
6郑晓洁,张琳.本体映射中相似度计算的改进[J].计算机科学,2013,40(12):108-112. 被引量：6
7王刚,邱玉辉.基于本体及相似度的文本聚类研究[J].计算机应用研究,2010,27(7):2494-2497. 被引量：9
8Boubekeur F,Boughanem M,Tamine L,et al.Using WordNet for Concept-based document indexing in information retrieval[C]//SEMAPRO 2010,The Fourth International Conference on Advances in Semantic Processing.2010:151-157. 被引量：1
9Budanitsky A,Hirst G.Semantic distance in WordNet:An experimental,application-oriented evaluation of five measures[C]//Workshop on WordNet and Other Lexical Resources.2001,2. 被引量：1

二级参考文献21

1薛为民,陆玉昌.文本挖掘技术研究[J].北京联合大学学报,2005,19(4):59-63. 被引量：63
2张承立,陈剑波,齐开悦.基于语义网的语义相似度算法改进[J].计算机工程与应用,2006,42(17):165-166. 被引量：38
3孙爽,章勇.一种基于语义相似度的文本聚类算法[J].南京航空航天大学学报,2006,38(6):712-716. 被引量：18
4SONG Shao-xu,LI Chun-ping.TCUAP:a novel approach of text clustering using asymmetric proximity[C] //Proc of IICAI.2005:676-685. 被引量：1
5WEINSTEIN P,BIRMINGHAM W.Comparing concepts in differentiated ontologies[C] //Proc of KAW-99.1999. 被引量：1
6WACHE H,VOGELE T,VISSER U,et al.Ontology based integration of information:a survey of existing approaches[C] //Proc of the IJCAI-01 Workshop on Ontologies and Information Sharing.New York:IEEE Press,2001:108-117. 被引量：1
7FRIDMANNOY N,MUSEN M.PROMPT:algorithm and tool for automated ontology merging and alignment[C] //Proc of AAAI-2000.Austin,Texas:MIT Press/AAAI Press,2000:450-455. 被引量：1
8PANDYA A,BHATTACHARYYA P.Text similarity measurement using concept representation of texts[C] //Proc of the 1st International Conference on Patttern Recognition and Machine Intelligence.Berlin,Germany:Springer,2005:678-683. 被引量：1
9Weinstein P,Birmingham W.Comparing concepts in differentiated ontologie[C] ∥Proc.of KAW-99.1999. 被引量：1
10Paolucci M.Semantic Matching of Web Service Capabilities[C] ∥Proceedings of the First International Semantic Web Conference(ISWC).2002. 被引量：1

共引文献21

1张玉芳,熊荣东,熊忠阳.本体概念与词汇的语义相似度计算方法[J].世界科技研究与发展,2011,33(5):763-764.
2谷俊,朱紫阳.基于聚类算法的本体层次关系获取研究[J].现代图书情报技术,2011(12):46-51. 被引量：6
3杨岳明,陈立潮,谢斌红,潘理虎.基于用户情境聚类的Web服务发现方法研究[J].计算机工程与设计,2012,33(4):1442-1446. 被引量：5
4明均仁.基于本体图的文本聚类模型研究[J].情报科学,2013,31(2):29-33. 被引量：6
5洪韵佳,许鑫.基于领域本体的知识库多层次文本聚类研究——以中华烹饪文化知识库为例[J].现代图书情报技术,2013(12):19-26. 被引量：9
6马莹,岳振军,顾思远,唐谦.基于本体和需求满足度的情报评价方法[J].情报杂志,2014,33(6):37-39. 被引量：1
7王琼.一种改进的k-means文本聚类优化方法[J].计算机与现代化,2015(3):48-51.
8唐成华,王丽娜,强保华,汤申生,张鑫.基于语义相似度的静态安全策略一致性检测[J].计算机科学,2015,42(8):166-169. 被引量：3
9周亮,黄志球,黄传林.故障树领域本体及SWRL规则的构建方法研究[J].计算机科学,2015,42(8):198-202. 被引量：9
10曾小芹.本体图驱动的概念相似度算法[J].软件导刊,2016,15(7):59-61.

1杨辉.制造业领域本体构建流程研究[J].贵州工业职业技术学院学报,2013,8(3):23-27.
2纪兆辉.一种基于本体语义的信息检索模型[J].计算机与数字工程,2010,38(11):118-121. 被引量：2
3张荐硕,方钰.基于向量空间模型的Web服务发现方法[J].计算机工程,2011,37(3):36-38. 被引量：2
4郭鑫,黎晓光.异构数据语义集成中本体映射研究[J].微计算机信息,2008,24(36):272-273. 被引量：3
5王伟,杨庚,张迎周,孔华云.基于程序切片和服务构件的语义Web服务组合[J].计算机技术与发展,2011,21(11):141-144. 被引量：2
6饶洋辉,叶良,程洁.WordNet在文本聚类中的应用研究[J].现代图书情报技术,2009(10):67-70. 被引量：1
7郝身刚,张丽.基于WSDL-S的轻量级语义Web服务匹配模型[J].计算机工程与设计,2010,31(9):2147-2150. 被引量：5
8潘政.基于快速分词的语义Web服务搜索系统设计[J].计算机技术与发展,2013,23(8):107-110. 被引量：1
9杨博,蔡东风,赵奇猛,杨华.融合WordNet的无监督语义分析研究[J].小型微型计算机系统,2014,35(2):368-373. 被引量：2
10曲云鹏,王文玲.一种分布式语义增强的词汇链文本表示模型构建方法[J].现代图书情报技术,2016(9):34-41. 被引量：2

河北省科学院学报

2014年第2期

浏览历史

内容加载中请稍等...

利用本体技术的文本聚类模型

参考文献9

二级参考文献21

共引文献21

相关作者

相关机构

相关主题

浏览历史