摘要
由于传统K-Means聚类算法对初始质心的赋值具有随机性,使得模拟结果有着极大的波动。针对这一问题,文章采用均分法,首先将样本数据清洗,将偏差值较大的数据筛除,然后把处理后的样本数据在二维平面内均匀划分成若干等分,进行排序归纳计算出最佳初始质心用于聚类算法的初次迭代,并调用SSE等度量指标对迭代的质心进行更新,最后将样本数据划分成有意义的簇。实验结果表明,文章针对K-Means算法的优化在一定程度上减少了质心的迭代次数,节省了时间同时提高了准确率,对初始质心赋值优化具有有效性和实用性的特点。
Due to the randomness of the initial centroid assignment of the original K-Means clustering algorithm,the simulation results fluctuate greatly.However,the current research has some deficiencies in the assignment of the initial centroid.In order to solve this problem,this paper uses the mean method to clean the sample data and screen out the data with large deviation,then evenly divide the processed sample data into several equal parts in the two-dimensional plane,sort and calculate the best initial centroid for the first iteration of the clustering algorithm,and call metrics such as SSE to update the centroid of the iteration,and finally divide the sample data into meaningful clusters.The experimental results show that the optimization of the K-Means algorithm reduces the number of iterations of the centroid to a certain extent,saves time and improves the simulation accuracy,and has the effectiveness and practicability of optimizing the initial centroid assignment.
作者
何嘉伦
马冲
HE Jialun;MA Chong(School of Software of Xinjiang University,Xinjiang Urumqi 830000)
出处
《长江信息通信》
2023年第6期69-72,75,共5页
Changjiang Information & Communications