摘要
针对聚类中广泛应用的经典k均值算法随机选择初始质心和易受孤立点影响的不足,给出了二次改进的k均值算法。首先使用距离法移除孤立点,然后采用邻近吸收法对初始聚类中心的选择进行改进,并做了改进前后的对比实验。结果表明,改进后的算法比较稳定、准确,受孤立点和随机选择质心的影响也有所降低。
The classic algorithm of in clustering, including both strong points k-means is discussed, which is one of the most widespread methods and weak points. Not only is it sensitive to the original clustering center,but also it may be affected by the outliers. Given these shortages, an improved algorithm is discussed, which makes improvements in outliers and selection of original clustering center. The outlier detection based on the distance method. To select original clustering center based on the nearest neighbour is assimilated. Checking experiment has been done, which indicates the improved one is more stable, more accurate and the affection by the oufliers is down to a much low figure.
出处
《陕西理工学院学报(自然科学版)》
2009年第3期45-49,共5页
Journal of Shananxi University of Technology:Natural Science Edition
基金
黑龙江省教育厅科学技术研究项目(No.11521008)
黑龙江省自然科学基金资助项目(No.F200603)
关键词
K均值算法
孤立点
初始质心
距离
algorithm of k-means
outliers
original clustering center
distance