摘要
提出了一种新的聚类算法NGKCA,该算法克服了经典聚类算法检测率和稳定性的不足,适用于解决大数据环境下的聚类问题。NGKCA聚类算法包括4个阶段:首先利用谱聚类NJW算法对大数据集进行列降维和数据归一化处理,其次引入对初始值不敏感的粒子群算法对数据集进行行降维从而选出临时的聚类中心集,接着通过全局Kmeans算法对最佳聚类中心集进行聚类以获取聚类中心点,最后使用粒子群算法对聚类中心点进行调整进而获取最终的聚类划分。在一些著名的机器学习数据集和国际标准的网络安全数据集KDDCUP99上进行实验,结果表明:提出的算法比谱聚类、Kmeans、粒子群、全局Kmeans等常见算法具有更好的稳定性和更高的检测率,与全局Kmeans算法相比具有更优的时间复杂度。
The clustering method for big data has attracted lots of interest in recent years. This paper proposed a novel global k-means clustering algorithm (NGKCA). The proposed clustering method comprises four phrases, namely row dimension reduction phrase, line dimension reduction phrase, global k-means clustering phrase and the adjustment of clustering center point. The row dimension reduction phrase is realized by means of spectral clustering method,while the line dimension reduction phrase is realized with the aid of particle swarm optimization. Both the row dimension reduction phrase and the line dimension reduction phrase are completed, and then the global k-means clustering phrase and the PSO phrase proceed. The experiments were carried out on some well-known machine learning data set and a standard network security data set KDI)CLIp99. Experimental results show that the proposed NGKCA leads to superior performance in comparison with some common algorithms reported in the literature and the time complexity of the NGKCA is better than the algorithm of global k-means.
出处
《计算机科学》
CSCD
北大核心
2015年第12期247-250,共4页
Computer Science
基金
国家自然科学基金项目(61272450)
天津市科技支撑项目(14ZCZDGX00072)资助