摘要
本文针对传统的聚类算法倾向于识别大小类似的球形聚类簇,且对离群数据较为敏感等问题,利用聚类簇代表点选取的方法,设计了一种有效的聚类算法。该方法首先从聚类簇中选取充分分散的若干数据点,然后将它们向聚类簇的重心收缩,依此得到的多个数据点作为聚类簇的代表。通过选取多个代表点,本算法可以捕捉到不同形状的聚类簇的几何特征,且受离群数据的影响较小,实验结果表明,该算法处理复杂数据是有效的。
To solve the problems existing in traditional clustering algorithms which are favorable to identify clusters with same size and spherical shape, and are sensitive to outliers, an effective clustering algorithm which using representative data of clusters is designed in this paper. In this approach, some well scattered data points are first selected from each cluster, then these points are shrunk to the center of the cluster. These obtained points are the representatives of clusters. In each hierarchical clustering step of the proposed algorithm, the pair of clusters that have the smallest distance among all pairs of clusters are merged into one cluster. When computing the distance between a pair of clusters, all distances between pairs of representative points, one of which is selected from a cluster and the other is selected from the other cluster, are calculated and then the smallest distance is served as the distance between two clusters. Through selecting multiple representative points in this way, the algorithm can capture the geometry features of clusters with different shapes and sizes, and is not very sensitive to the outliers. The experimental results demonstrate the effectiveness of the algorithm on clustering complex data.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2001年第4期417-422,共6页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金