摘要
为有效处理动态增长的数据集,获得增量数据的聚类结果,文中利用高斯混合模型模拟数据分布,将原有样本所属高斯成分的均值和先验概率视为这些样本的代表点信息,并将其后验概率松弛为0和1两种状态,在密度参数新的迭代公式的基础上,依据标准EM算法设计增量式EM算法,避免了在估计混合密度参数的过程中对原有样本后验概率的重复计算,从而在高效获得增量数据聚类结果的同时实现了高斯混合模型密度参数的更新.实验结果表明,增量式EM算法能够高效处理大规模增量数据集,并能达到较高的聚类精度.
To effectively deal with dynamic data set and obtain clustering results Of incremental data, Gaussian mixture model is used to fit distribution of the data in this paper. Each mean and prior probability of the Gaussian components of original data are regarded as representative points of the original data, and the posterior probability is relaxed to two states, namely 0 or 1. Based on the new iteration formulas of density parameters, incremental EM algorithm is designed according to standard EM algorithm. Therefore, repeated computation of posterior prob- ability about the original data is avoided during the estimation process of the mixture density parameters. Further- more, clustering results of the incremental data are effectively obtained and updates of the mixture density param- eters are realized. The results from experiments display that the incremental EM algorithm can effectively deal with large scale incremental data set and good clustering accuracy can be obtained, too.
出处
《江苏科技大学学报(自然科学版)》
CAS
北大核心
2011年第6期597-601,共5页
Journal of Jiangsu University of Science and Technology:Natural Science Edition
基金
国家民航总局软科学项目(MHRD201007)
关键词
增量聚类
EM算法
增量EM算法
incremental clustering
EM algorithm
incremental EM algorithm