期刊文献+

基于粗糙集的高维分类型数据子空间聚类算法

An Algorithm for High Dimensional Categorical Clustering Using Rough Set Theory
下载PDF
导出
摘要 现有的优秀的聚类算法大多是处理低维数据的,但是对于高维数据,由于其分布特性与低维情形有很大的差异,这些算法失效.为解决高维分类型数据聚类问题,提出了一种基于粗糙集的高维分类型数据子空间聚类算法,基于粗糙集的上、下近似集的类边界描述,确定了类边界范围,然后采用相容度来调整类边界,聚类的过程采用增长子空间的思想,从低维到高维迭代地搜子空间类簇.最后通过在soybean、zoo数据集上的对比实验,实验结果表明了算法不仅可行,而且精度高. The existing excellent clustering algorithms are mostly used in processing the low dimensional data. For high dimensional data, its distribution characteristics are different from the low dimensional case. These algorithms fail to solve the high dimension data clustering problem. A clustering algorithm is presented based on the rough set and high dimensional categorical data subspace. The rough set's up and down approximations set to describe the class boundary, thus determine the range of boundary. The consistency degree is used to determine the clustering. The clustering process uses the growth subspace idea. Finally, good results are obtained through the experiment on the soybean, zoo data set. Results show that the algorithm is feasible and has high precision.
机构地区 汕头大学工学院
出处 《汕头大学学报(自然科学版)》 2012年第4期46-53,共8页 Journal of Shantou University:Natural Science Edition
基金 国家自然科学基金资助项目(61170130)
关键词 高维分类型数据 增长子空间 粗糙集 聚类 high dimension categorical data growth subspace information entropy rough set clustering
  • 相关文献

参考文献6

二级参考文献45

  • 1尉永青,刘培德.模糊集技术在网络信息过滤系统中的应用研究[J].西藏大学学报(社会科学版),2004,19(4):85-87. 被引量:1
  • 2冯征.一种基于粗糙集的K-Means聚类算法[J].计算机工程与应用,2006,42(20):141-142. 被引量:16
  • 3ERTOZ L, STEINBACH M, KUMAR V. Finding clusters of different sizes, shapes and densities in noisy high-dimensional data[ R]. Minnesota: Department of Computer Science, University of Minnesota, 2002. 被引量:1
  • 4HAM J H, LEE D D, SAUL L K. Learning high-dimensional correspondences from low dimensional manifolds [ C ]//Proc of ICML Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining. Washington: [ s. n. ] , 2003:34-41. 被引量:1
  • 5KOHONEN T. Self-organization and associated memory [ M]. [ S. l. ]: Springer-Verlag, 1988. 被引量:1
  • 6KOHONEN T. Self-organizing maps [ M ]. New York: Spinger-Verlag, 2001. 被引量:1
  • 7MINKA T P. Automatic choice of dimensionality for PCA[ C ]//Proc of International Conference on Advances in Neural Information Processing Systems. Cambridge: [ s. n. ] , 2001:598-604. 被引量:1
  • 8GRIFFITHS T L, KALISH M L. A muhidimensional scaling approach to mental multiplication[ J ]. Memory & Cognition, 2002,30 ( 1 ) : 97-106. 被引量:1
  • 9CAMASTRA F, VINCIARELLI A. Estimating the intrinsic dimension of data with a fractal-based method [J].IEEE Trans on Pattern Anal Mach Intell, 2002,24(10) :1404-1407. 被引量:1
  • 10CAMASTRA F. Data dimension estimation methods: a survey[ J]. Pattern Recognition, 2003, 36:2945-2954. 被引量:1

共引文献45

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部