期刊文献+

一种基于术语簇和关联规则的文档聚类方法

Document clustering approach based on term clustering and association rules
下载PDF
导出
摘要 提出一种新的基于术语簇和关联规则的文档聚类方法。首先对文档集合进行分词,根据术语之间的平均互信息形成术语簇,用术语簇来表示文档矢量空间模型,使用关联规则挖掘文档的初始聚类,对此进行聚类分析获得最终的文档聚类。实验结果表明,与传统的聚类方法相比,其运行速度快,聚类效果和聚类质量都有明显提高。 This paper proposes a new document clustering approach based on term clustering and association rules.In this approach,firstly we extract words from document collection,then construct term clustering according to AMI(Average Mutual Informarion) between terms,the document VSM(Vector Space Model) is represented by term clustering,then we use association rules to mirle initial document clustering,finally we do the clustering analysis to get final document clustering.The experimental results show that the performance and clustering quality of this approach are obviously improved than those of traditional methods in the procession of document clustering.
出处 《计算机工程与应用》 CSCD 北大核心 2007年第5期178-181,188,共5页 Computer Engineering and Applications
基金 国家自然科学基金(the National Natural Science Foundation of China under Grant No.70571056) 河北省科学技术研究与发展计划(04213534)
关键词 术语簇 关联规则 文档聚类 WEB挖掘 矢量空间模型 term clustering association rules document clustering Web mining Vector Space Model
  • 相关文献

参考文献10

二级参考文献41

  • 1.人民日报标注语料库(1998年1月份).http://www.fujitsu.corn.on,[EB/OL],2001. 被引量:1
  • 2.Word Clustering.http://www.ilc.pi.cnr.it/EAGLES96/ rep2/node37.html [EB/OL].,. 被引量:1
  • 3[1]Broder,A.Z.,Glassman,S.C.,Manasse,M.S.Syntactic clustering of the Web.Technical Report,1997-015,Palo Alto,CA:Digital Systems Research Center (Digital),1997. 被引量:1
  • 4[2]Chang,C.H.,Hsu,C.C.Customizable multi-engine search tool with clustering.Computer Network and ISDN Systems,1997,29(8-13):1217~1224. 被引量:1
  • 5[3]Chen,L.,Katya,S.Webmate:a personal agent browsing and searching.In:Sycara,K.P.,Wooldridge,M.,eds.Proceedings of the 2nd International Conference on Autonomous Agents.New York:ACM Press,1998.132~139. 被引量:1
  • 6[4]Ron,W.,Bienvenido,V.,Mark,A.S.,et al.Hypursuit:a hierarchical network search engine that exploits content-link hypertext clustering.In:ACM,ed.Proceedings of the 7th ACM Conference on Hypertext.New York:ACM Press,1996.180~193. 被引量:1
  • 7[5]Ackerman,M.,Billsus,D.,Gaffney,S.,et al.Learning probabilistic user profiles.AI Magazine,1997,18(2):47~56. 被引量:1
  • 8[6]Cheeseman,P.,Stutz,J.Bayesian classification (autoclass):theory and results.In:Fayyad,U.M.,Piatetsky-Shapiro,G.,Smyth,P.,et al.,eds.Advances in Knowledge Discovery and Data Mining.Menlo Park,CA:AAAI/MIT Press,1996.153~180. 被引量:1
  • 9[7]Agrawal,R.,Srikant,R.Fast algorithm for mining association rules.In:Jorge,B.B,Matthias,J.,Carlo,Z.,eds.Proceedings of the 20th International Conference on Very Large Databases.Santiago:Morgan Kaufmann Publishers,Inc.,1994.487~499. 被引量:1
  • 10徐志明.[D].哈尔滨工业大学工学,2001. 被引量:1

共引文献47

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部