摘要
为提高数据分布不规则和含有噪音时的数据流聚类质量,提出了一种有效的数据流二次聚类算法TCLUSA.该算法基于分区思想,采用DBSCAN方法对每块分区进行聚类,以得到的簇的均值点作为其代表点,再用k-m eans对所获得的代表点进行聚类,算法采用分层结构保存每次聚类获得的簇参考点,直至获得最终结果.理论分析和实验结果表明,TCLUSA算法能有效提高数据流的聚类质量.
In order to enhance the quality of data stream clustering towards noisy and unbalanced data, an effective twice-clustering algorithm for data streams, TCLUSA for short, was proposed TCLUSA is based on the simple divide-and-conquer and separability theorems, uses DBSCAN ( density-based spatial clustering of applications with noise) to get the average point of each cluster as its local result, and then achieves the final result by clustering all the average points using the k- means. The algorithm keeps all the average points by a layered structure. The theoretical analysis and experimental results demonstrate that the proposed algorithm can enhance clustering quality efficiently when data distribution is abnormal or a high dimensional data stream is dealt with.
出处
《西南交通大学学报》
EI
CSCD
北大核心
2009年第4期490-494,共5页
Journal of Southwest Jiaotong University
基金
安徽省自然科学基金资助项目(050420207)
安徽省高校青年教师科研资助计划(2005jq1012)
关键词
数据流聚类
密度簇参考点
k-均值参考点
data stream clustering
reference point of density cluster
k-means reference point