期刊文献+

一种有效的高维分类数据聚类方法研究 被引量:2

An Effective High Dimensional Categorical Data Clustering Method Research
下载PDF
导出
摘要 随着数据规模的不断增大,提高K-modes聚类算法或模糊K-modes聚类算法的运行效率成为了一个重要问题.为了提高其算法执行效率,提出了一种基于分治法的高维分类数据聚类方法.该方法并不是一次性对所有的数据进行聚类,而是将分类数据集分成若干个子集,对每个子集同时进行聚类,最后对聚类结果进行融合以形成最终的聚类结果.实验结果表明大多数情况下较传统的方法在聚类的速度上有显著的提高. With the increasing size of data set,improving the efficiency of K-modes clustering algorithm or fuzzy K-modes clustering algorithm is becoming a critical issue.In order to improve the efficiency of the algorithm,a clustering method based on divided and conquered method was proposed.This method,not a one-time clustering of all data,divided the data set into several subsets,and each subset was clustered at the same time;the fusion results of each subset cluster form the final clustering results.The results show that the efficiency of clustering has been increased greatly compared with traditional clustering method in most cases.
出处 《微电子学与计算机》 CSCD 北大核心 2011年第6期88-91,共4页 Microelectronics & Computer
基金 国家自然科学基金资助项目(60970014) 教育部高等学校博士点基金(200801080006) 教育部科学技术研究重点项目(207018) 山西省重点实验室开放基金项目(2007031017) 太原市科技明星专项基金项目(09121001)
关键词 聚类分析 模糊聚类 分治法 分类数据 评价指标 clustering analysis fuzzy clustering divided and conquered method large categorical data sets evaluation index
  • 相关文献

参考文献9

  • 1孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量:1079
  • 2Jain A K, Murty M N, Flynn P J. Data clustering: a re- view [J]. ACM Computing Surveys, 1999, 31(3) : 274 -289. 被引量:1
  • 3Huang Zhexue. Extensions to the k means algorithms for clustering large data sets with categorical values[J]. Data Mining and Knowledge Discovery, 1998, 2(3): 283-304. 被引量:1
  • 4Michael K Ng, Mark Junjie Li,Joshua Zhexue Huang,et al. On the impact of dissimilarity measurein k- modes clustering algorithm[J]. IEEE Transactions On Pattren Analysis and Machine Intelligence, 2007,29(3): 503-507. 被引量:1
  • 5吕国英,任瑞征,钱宇华.算法设计与分析[M].2版.北京:清华大学出版社,2009:139-150. 被引量:2
  • 6蔡自兴,徐光祐.人工智能及其应用[M].3版.北京:清华大学出版社,2006. 被引量:2
  • 7Fuyuan Cao, Jiye Liang, Liang Bai. A new initialization method for categorical data clustering[J]. Expert Sys- tems with Applications. 2009(36):10223-10228. 被引量:1
  • 8白亮,梁吉业,曹付元.基于粗糙集的改进K-Modes聚类算法[J].计算机科学,2009,36(1):162-164. 被引量:15
  • 9白亮,曹付元,梁吉业.基于新的相异度量的模糊K-Modes聚类算法[J].计算机工程,2009,35(16):192-194. 被引量:5

二级参考文献20

  • 1张敏,于剑.基于划分的模糊聚类算法[J].软件学报,2004,15(6):858-868. 被引量:176
  • 2李洁,高新波,焦李成.基于特征加权的模糊聚类新算法[J].电子学报,2006,34(1):89-92. 被引量:114
  • 3Han Jiawei,Kamber M. Data Mining:Concepts and Techniques. San Francisco, US: Morgan Kaufmann, 2001 被引量:1
  • 4MacQueen J B. Some methods for classification and analysis of multivariate observation//Proceeding 5^th Berkley Symposium, on Mathematical Statistics and Probability. 1967, I:281-297. University of California Press, 1967, Xvii, 666 被引量:1
  • 5Huang Zhexue. Clustering Large Data Sets with Mixed Numeric and Categorical Values//PAKDD'97. Singapore, World Scientific, 1997:21-35 被引量:1
  • 6Huang Zhexue. Extensions to the k Means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 1998,2 : 283-304 被引量:1
  • 7Michael K, Ng M, Li Junjie, et al. On the impact of dissimilarity measure in K-Modes clustering algorithm. IEEE Transaction on Pattern Analysis and Machine Intelligence, 2007,29 (3) : 503-507 被引量:1
  • 8Li Cen, Biswas Gautam. Unsupervised learning with mixed numeric and nominal data. IEEE Transactions on Knowledge and Data Engineering, 2002,14 :673-690 被引量:1
  • 9Hsu C C, Chen Chinlong, Su Yuwei. Hierarchical clustering of mixed data based on distance hierarchy. Information Sciences, 2007 :4474-4492 被引量:1
  • 10Hsu C C. Generalizing self-organizing map for categorical data. IEEE Transaction on Neural Network, 2006,17 (2) : 294-304 被引量:1

共引文献1095

同被引文献9

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部