近似k-median分类属性数据聚类

Approximate k-median Clustering for Categorical Data

下载PDF

导出

摘要数据挖掘中解决分类属性数据聚类的算法有很多种,但大多数基于划分的方法得到的聚类中心一般不是数据集中的实际数据对象,缺乏实际的物理意义,有时会导致某一聚类为空。该文研究了近似k-median的求解算法,用数据的近似中值来代替模式进行聚类,提出了分类属性数据的近似k-median聚类算法,克服了一般基于划分的可分类属性数据聚类中所遇到的问题,仿真实验证明该算法有效。 Based on the approximate k-median algorithm, an approximate k-median clustering algorithm for categorical data is developed. The algorithm replaces the modes in k-modes algorithm with the approximate medians of data set, and optimizes the center of cluster with the approximate k-median algorithm. The center of cluster is an actual sample of data set, which prevents the empty cluster. The experiments indicate the algorithm is effective.

作者赵恒张高煜

机构地区西安电子科技大学电子工程学院

出处《计算机工程》 CAS CSCD 北大核心 2007年第8期66-67,70,共3页 Computer Engineering

关键词数据挖掘近似k-median聚类分类属性数据 Data mining Approximate k-median clustering Categorical data

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献8

1Huang Zhexue. Extensions to the k-means Algorithms for Clustering Large Data Sets with Categorical Values[J]. Data Mining and Knowledge Discovery, 1998, 2(3): 283-304. 被引量：1
2Huang Z. A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining[C]//Proc. of Research Issues on Data Mining and Knowledge Discovery. 1997. 被引量：1
3De la Higuera C, Casacuberta F. Topology of Strings: Median String is NP-complete[J]. Theoretical Computer Science, 2000, 230(1/2): 39-48. 被引量：1
4Martinez C, Juan A, Casacuberta F. Improving Classification Using Median String and NN Rules[C]//Proceedings of IX Simposium Nacional de Reconocimiento de Formasy Anlisis de Imgenes. 2001:391-394. 被引量：1
5Diday E. The Symbolic Approach in Clustering, Classification and Related Methods of Data Analysis[M]. North Holland Publishing,1988. 被引量：1
6Milligan G W, Soon S C, Sokol L M. The Effect of Cluster Size,Dimensionality and the Number of Clusters on Recovery of True Cluster Structure[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1983, 5(1): 40-47. 被引量：1
7Halkidi M, Batistakis Y, Vazirgiannis M. On Clustering Validation Techniques[J]. Intelligent Information Systems, 2001, 17(2/3): 107-145. 被引量：1
8赵恒,杨万海.模糊K-Modes聚类精确度分析[J].计算机工程,2003,29(12):27-28. 被引量：14

二级参考文献1

1陈宁,陈安,周龙骧.数值型和分类型混合数据的模糊K-Prototypes聚类算法(英文)[J].软件学报,2001,12(8):1107-1119. 被引量：47

共引文献13

1曹文婷,邹海,段凤玲.基于模糊K-Modes和免疫遗传算法的聚类分析[J].计算机技术与发展,2009,19(2):151-153. 被引量：2
2李仁侃,叶东毅.粗糙K-Modes聚类算法[J].计算机应用,2011,31(1):97-100. 被引量：5
3李仁侃,叶东毅.属性赋权的K-Modes算法优化[J].计算机科学与探索,2012,6(1):90-96. 被引量：3
4张月琴,陈彩棠.基于新相异度量的模糊K-Modes聚类算法[J].电脑开发与应用,2012,25(5):32-34. 被引量：2
5杨阳,张为群,刘枫,黄仁杰.基于MapReduce自适应参数的粗糙K-modes算法研究[J].计算机科学,2012,39(11):149-152.
6黄德才,汤胜龙.基于网格的量子博弈聚类算法[J].计算机科学,2014,41(10):261-265. 被引量：2
7余泽.基于相对密度和熵的混合属性聚类融合算法[J].计算机系统应用,2014,23(12):125-130.
8刘增锁.云计算环境下海量数据中入侵检测挖掘模型[J].计算机仿真,2015,32(6):289-291. 被引量：13
9刘培奇,胡红光,张凯,黄苗.基于距离和密度双度量的模糊k-modes算法[J].工业控制计算机,2015,28(9):90-91.
10潘春燕,吴有富,李方.一种基于可变网格划分的密度偏差抽样技术及其在聚类中的应用研究[J].凯里学院学报,2017,35(3):16-20. 被引量：2

1邓峰.多跳网络中分类属性数据模糊聚类仿真[J].计算机仿真,2017,34(1):292-295. 被引量：12
2张灿龙,李忠利,陈华彬.一种改进DBSCAN密度聚类算法[J].数字技术与应用,2016,34(11):134-134.
3李霞,蒋盛益,郭艾侠.基于聚类和信息熵的特征选择算法[J].郑州大学学报（理学版）,2009,41(1):77-80. 被引量：4
4李桃迎,陈燕,张金松,张琳.一种面向分类属性数据的聚类融合算法研究[J].计算机应用研究,2011,28(5):1671-1673. 被引量：7
5顾文强,李志华.基于互信息的分类属性数据特征选择算法[J].计算机工程与应用,2014,50(16):135-139. 被引量：3
6武森,张桂琼,潘静,全敏.分类属性数据的泛化中心聚类算法[J].运筹与管理,2014,23(6):37-43.
7蒋盛益,李庆华.聚类分析中的差异性度量方法研究[J].计算机工程与应用,2005,41(11):146-149. 被引量：4
8谢坤武,陈世强.一种分类数据的聚类算法[J].计算机研究与发展,2006,43(z3):332-337. 被引量：1
9李建伏,赵玉成,贺怀清.基于最大似然原理的分类属性数据分层聚类算法[J].计算机应用与软件,2015,32(3):247-252. 被引量：3
10梁吉业,白亮,曹付元.基于新的距离度量的K-Modes聚类算法[J].计算机研究与发展,2010,47(10):1749-1755. 被引量：46

计算机工程

2007年第8期

浏览历史

内容加载中请稍等...

近似k-median分类属性数据聚类

参考文献8

二级参考文献1

共引文献13

相关作者

相关机构

相关主题

浏览历史