摘要
为解决传统基于贝叶斯理论的概率模糊聚类(BayesianFuzzyClustering,BFC)算法在处理大规模数据集聚类时的时间开销和存储代价瓶颈,提出基于数据分块的单程自适应加权BFC算法,算法在大规模数据集分块的基础上,设计了基于数据加权的改进BFC算法,用于数据分块内数据聚类,以挑选出对聚类贡献最具代表的标识数据及其自适应权值,在块间迭代聚类过程中,将标识数据及其权值合并到下一数据块中并参与聚类,从而将上一数据块的聚类信息有效地传递到下一数据块中,最后分析算法的收敛性和时间复杂度。实验结果表明,算法在继承传统BFC算法良好聚类性能基础上,减少计算复杂度,有效提高聚类效率,适用于大规模数据集聚类。
In order to solve the time and storage cost bottlenecks of traditional Bayesian fuzzy clustering(BFC)algorithm during processing large-scale data clusters,a single-way adaptive weighted BFC algorithm based on data block partition is proposed.Based on the block partition of the large-scale data set,an improved BFC algorithm based on data weighting for data clustering is designed to make data clustering in data block partition and to select the most representative identification data to the cluster Weights and its adaptive weights contributed to clustering.And then,in the iterative clustering process between blocks,the identification data and its weights are merged into the next data block and to participate in clustering,so that the clustering information of the previous data block is effectively passed to the next data block.Finally,the convergence and time complexity of the algorithm are analyzed.The experimental results show that the algorithm reduces the computational complexity on the basis of inheriting the super clustering performance of the traditional BFC algorithm,and effectively improves the clustering efficiency,and it is suitable for the clustering of large-scale data sets.
作者
景慎艳
刘松迪
JING Shen-yan;LIU Song-di(Big Data Ilnstitute,Liaoning University of lnternational Business and Economics,Daliam 116052,China;Software College,Jilin University,Changchun 130000,China)
出处
《火力与指挥控制》
CSCD
北大核心
2021年第12期88-93,共6页
Fire Control & Command Control
关键词
大规模数据集聚类
数据分块
加权概率模糊聚类
自适应数据加权
聚类信息传递
large-scale data sets clustering
data block partition
weighted probability fuzzy clustering
adaptive data weighting
cluster information