期刊文献+

高维分类属性的子空间聚类算法 被引量:6

Clustering Algorithm for Mining Subspace Clusters in Categorical Datasets
下载PDF
导出
摘要 高维分类数据的处理一直是数据挖掘研究所面临的巨大挑战.传统聚类算法主要针对低维连续性数据的聚类,难以处理高维分类属性数据集.本文提出一种处理高维分类数据集的子空间聚类算法(FP-Tree-based SUBspace clustering algorithm,FPSUB),利用频繁模式树将聚类问题转化为寻找属性值的频繁模式发现问题,得到的频繁模式即候选子空间,然后基于这些子空间进行聚类.针对真实数据集的实验结果表明,FPSUB算法比其他算法具有更高的准确度. High-dimensional categorical datasets play an important role, so it's significant to cluster these datasets. However, traditional clustering algorithms mainly aim at lower-dimensional continuous datasets, whereas they are difficult to deal with categorical datasets. A new subspace clustering algorithm -FPSUB is proposed. R stores the information of datasets with a FP-Tree framework, which transforms clustering clusters into finding the frequent patterns, and then utilizes them to cluster the objects. The experiment results demonstrate the feasibility and robusmess of this algorithm.
出处 《小型微型计算机系统》 CSCD 北大核心 2009年第10期2016-2021,共6页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(70671016 60673066)资助
关键词 分类属性 子空间聚类 频繁模式 FP-树 categorical data subspace clustering frequent-pattern FP-Tree
  • 相关文献

参考文献16

  • 1Ordonez C. Clustering binary data streams with k-means[ C]. In: SIGMOD DMKD Workshop, 2003,12-19. 被引量:1
  • 2Wang K, Xu C, Liu B. Clustering transactions using large items [C]. In:CIKM Conf, 1999,483-490. 被引量:1
  • 3Koyuturk M, Grama A. PROXIMUS : a framework for analyzing very high-dimensional discrete attributed datasets[ C ]. In:SIGKDD Conf,2003,147-156. 被引量:1
  • 4Han E, Karypis G, Kumar V, et al. Clustering based on association rule hypergraphs[ C]. In: SIGMOD DMKD Workshop, 1997,252- 271. 被引量:1
  • 5Ganti V, Gehrke J, Ramakrishnan R. CACTUS: clustering categorical data using summaries[ C ]. In: SIGKDD Conf, 1999,73-83. 被引量:1
  • 6Guha S, Rastogi R, Shim K. Rock: a robust clustering algorithm for categorical attributes [ J ]. Information System, 2000,25 ( 5 ) : 345-366. 被引量:1
  • 7Andritsos P, Tsaparas P, Miller R J, et al. LIMBO: scalable clustering of categorical data[ C ]. In:9th Int'l Conf. on Extending Database Technology,2004:531-532. 被引量:1
  • 8Barbara D, Li Y, Couto J. Coolcat: an entropy-based algorithm for categorical clustering[ C]. In : CIKM Conf,2002,582-589. 被引量:1
  • 9Darshit Parmar, Teresa Wu * , Jennifer Blackhurst. MMR:An algorithm for clustering categorical data using trough set theory[ C]. In Data & Knowledge Engineering,2007,63 ( 3 ) : 879 -893. 被引量:1
  • 10Agrawal R, Gehrke J, Gunopulos D, et al. Automatic subspace clustering of high dimensional data for data mining applications [C]. In:SIGMOD Record ACM Special Interest Group on Management of Data, 1998,94-105. 被引量:1

二级参考文献1

共引文献163

同被引文献87

引证文献6

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部