摘要
为了解决增量大数据聚类速度缓慢问题,提出了一种结合密度峰和代表点分析的快速聚类算法.先对样本集进行初始化聚类,然后根据删除失效的聚类数据调节聚类簇群的密度均值,再利用代表点的算法对样本集进行更新,最后采用密度峰算法进行重复聚类从而更新聚类核心点.通过实验分析表明:该算法可有效提高算法收敛速度.在应用方面,将这种聚类算法引用到大数据量的人脸聚类工作中,优化人脸聚类的效果.
In order to solve the speed slow problem of clustering the incremental large data, this paper proposes a fast clustering method based on the representative points and the density peaks. Firstly, this algorithm uses the method of representative points to achieve clustering the incremental large data. According to deleting the invalid cluster data, the average density of cluster is adjusted. Then the algorithm of representative points is used to update the samples. Finally, the algorithm of density peaks is used to repeat clustering in order to update the core point. The experimental results show that the algorithm can effectively improve the convergence speed of the algorithm. In the application aspect, this clustering algorithm can be used in face clustering work with the large amount of data and optimize the effect of face clustering.
作者
郑河荣
陈恳
潘翔
ZHENG Herong CHEN Ken PAN Xiang(College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China)
出处
《浙江工业大学学报》
CAS
北大核心
2017年第4期427-433,共7页
Journal of Zhejiang University of Technology
基金
浙江省科技厅项目(2016C31G2020061)
浙江省自然科学基金资助项目(LY15F020024)
关键词
时效性
在线聚类
代表点
密度均值
timeliness
online clustering
representative points
density mean value