摘要
I-k-means-+算法作为一种新的k-means目标函数优化算法,通过分裂与删除簇提高解的质量,在一定程度上克服了k-means算法容易陷入局部最优解而导致目标函数优化效果不佳的问题,但该算法采用一种比较粗糙的方式估计各簇的Gain值和Cost值,影响了目标函数优化效果。针对此问题本文提出了一种基于模拟划分的SP-k-means-+算法,根据各簇模拟划分的情况,更准确地计算各簇的Gain值和Cost值,降低了簇对匹配过程中漏检与误判的可能性,在每次迭代中选择更合适的簇对执行分裂删除操作,进一步优化了目标函数并且避免了无效迭代造成的冗余计算问题。实验结果表明:当无需-+操作时,本文算法与I-kmeans-+算法的目标函数一致且效率提升了16%;当需要-+操作时,本文算法在不降低计算效率的前提下目标函数优化效果较I-k-means-+算法更佳,聚类模型解的精度提高了10%以上,最高达到47%。
K-means algorithm is easy to fall into local optimal solution,which leads to poor optimization effect of objective function,especially when k-means is applied to data sets which the number of data elements,data dimension and clusters are large. To solve this problem,iterative k-means minus–plus(Ik-means-+)is proposed in 2018,as a new clustering algorithm of objective function optimization for kmeans,which improves iteratively the quality of solution of clustering by removing one cluster(minus),dividing another one(plus),from the generated clustering solution,and updating each cluster with topical k-means. However,I-k-means-+ algorithm roughly estimates the gain value and cost value of each cluster,sacrifices precision for lower computational complexity and affects the optimization of objective function. Focus on this problem,in this paper,there is a modified I-k-means-+ algorithm called SP-kmeans-+ which propose to calculate the gain value and cost value of each cluster more accurately,based on simulated partition of every cluster. It reduces the possibility of missing detection and misjudgment in the process of cluster pair matching and avoids the redundant computation caused by invalid iteration in I-kmeans-+ algorithm,selects more suitable cluster pairs to divide or delete in each iteration. SP-k-means-+ algorithm optimizes the objective function further,improves partial computing efficiency,and the simulated partition is accelerating by a new heuristic acceleration algorithm. Results of experiments according to real data set show that,given the same initial clustering solution,when there is no-+process,the efficiency of SP-k-means-+ algorithm is improved by 16%;when there is-+ process,SP-k-means-+ algorithm is improved by 10%—47% in objective function optimization compared with Ik-means-+ algorithm,meanwhile two algorithm has close runtimes.
作者
杨勇
陈强
曲福恒
刘俊杰
张磊
YANG Yong;CHEN Qiang;QU Fu-heng;LIU Jun-jie;ZHANG Lei(College of Computer Science and Technology,Changchun University of Science and Technology,Changchun 130022,China;College of Education,Changchun Normal University,Changchun 130032,China)
出处
《吉林大学学报(工学版)》
EI
CAS
CSCD
北大核心
2021年第5期1808-1816,共9页
Journal of Jilin University:Engineering and Technology Edition
基金
吉林省教育厅科研项目(JJKH20181164KJ)
国家自然科学基金项目(41671397)
吉林省教育科学“十三五”规划项目(GH19086).