摘要
针对传统聚类算法对流数据进行聚类时面临时间复杂度高、存储空间需求大以及准确度较低的问题,提出一种基于差异性采样的流数据聚类算法。首先利用差异性采样法对流数据进行采样并用样本点构造核矩阵,然后利用核模糊C均值聚类算法对核矩阵中的点进行聚类得到一个带有标记的样本核矩阵,最后利用带有标记的样本核矩阵对流数据中的点进行划分。同时利用衰退聚类机制,实时更新样本核矩阵。实验结果表明,相比于传统聚类算法,该算法实现了更低的时间复杂度,同时实时聚类,得到较为理想的聚类结果。
Concerning the problems of high time complexity, large storage space requirements and low accuracy when traditional clustering algorithm cluster stream data, this paper proposed a kind of stream data clustering algorithm based on differential sampling. First, it used the differential sampling method sampled stream data, and used sample points to construct kernel matrix. Then it used kernel fuzzy C-means clustering algorithm clustered the data points in the kernel matrix, obtained a marked sample kernel matrix. Finally, it used the marked kernel matrix divided the stream data. Meanwhile, this paper adop-ted the fading cluster mechanism to update kernel matrix in real time. Experimental results show that compared with the traditional clustering algorithm, the proposed algorithm achieves lower time complexity, real-time clustering at the same time, gets the ideal clustering result.
作者
邱云飞
孙梦冉
Qiu Yunfei;Sun Mengran(College of Software, Liaoning Technical University, Huludao Liaoning 125105, China)
出处
《计算机应用研究》
CSCD
北大核心
2019年第6期1646-1651,共6页
Application Research of Computers
基金
国家自然科学基金资助项目(61404069)
辽宁省教育厅科学研究项目(LJYL048)
关键词
差异性采样
衰退聚类机制
核模糊C均值
流数据
时间复杂度
differential sampling
fading cluster mechanism
kernel fuzzy C-means
stream data
time complexity