摘要
针对目前大多数聚类算法需要人为指定聚类簇数目的情况,在基于相似度阈值的聚类簇数目自动计算算法的基础上,改进初始样本的选择方式,提出改进的聚类簇数目自动计算算法。改进后的算法在选择初始样本时,优先使用靠近上一次迭代生成的聚类中心且与样本集中心的相似度较小的样本。改进后的算法不仅可以自动计算聚类簇数目,并且具有更好的稳定性和准确性。
On the basis of the automatic calculation of "K" algorithm based on the similarity threshold, in- troduces an improved caculation of "K" algorithm by improving the selecting method of the ini- tial sample, since the great majorities of clustering algorithms need the number of clusters by human. This improved algorithm would use the samples, which are near to the cluster centers and have the lower similarity with the center of sample set, when select the initial samples. The improved algorithm could not only calculate the number of clusters automatically, but also be more stable and accurate.
基金
重庆市教委科技基金项目(No.KJ091306)
关键词
相似度阈值
聚类中心
K-中心
聚类算法
Similarity Threshold
Cluster Center
K-Medoids
Clustering Algorithm