摘要
为解决传统谱聚类算法在应用于大规模数据上时,复杂度较高且资源占用较大,导致算法聚类效果不好甚至无法聚类的问题,提出基于并行框架和采样相结合的改进谱聚类算法,算法在自适应相似矩阵计算基础上,通过数据分块和单向节点并行,提高算法相似矩阵的计算效率,通过Nyström加权抽样逼近,减少拉普拉斯矩阵特征向量的计算复杂度,最后通过KD树结构避免k-mean聚类过程的距离计算,从而提高了聚类效率。仿真实验结果表明,文中算法在取得与传统算法相近的聚类性能的同时,取得更好的加速比,验证了算法对大规模集的良好适应性。
To solve the problem of high complexity and large resource occupancy of the traditional spectral clustering algorithm applying to large-scale data,resulting in the poor clustering effect of the algorithm or even the inability to cluster.An improved spectral clustering algorithm based on the combination of parallel framework and sampling is proposed.The data block and unidirectional node parallelism are used to improve the calculation efficiency of the algorithm similarity matrix based on the adaptive similarity matrix calculation.And then,the Nyström weighted sampling approximation is used to reduce the computational complexity of the Laplacian matrix eigenvectors.Finally,the KD tree structure is used to avoid the distance calculation of the k-mean clustering process,thereby improving the clustering efficiency.The experimental results show that,the proposed algorithm achieves a better speedup than that of traditional algorithms,with the similar clustering performance verifying the algorithm′s good adaptability to large-scale sets.
作者
郝笑弘
尹青山
HAO Xiao-hong;YIN Qing-shan(School of Shanxi Conservancy Technical Institute,Shanxi Taiyuan 030032,China;Software College,Jilin University,Jilin Changchun 130012,China)
出处
《机械设计与制造》
北大核心
2021年第10期211-214,共4页
Machinery Design & Manufacture
基金
亚太经济合作组织(APEC)项目(No.ZGYZJY2019YB)。
关键词
大规模谱聚类
自适应相似矩阵计算
单向节点并行
Nyström加权抽样
KD树优化
Large-scale Clustering
Adaptive Similarity Matrix Calculation
One-way Node Parallel Computing
Nyström Weighted Sampling
KD-Tree Optimization