摘要
数据流随时间演变具有突发性及随机性的特点,如何自适应、实时追踪这种变化是数据流挖掘面临的一个重要问题,完全由用户通过试探来甄别这种变化在实际中无法实现,同时也失去了数据流聚类进化追踪的现实意义。针对聚类变化自动追踪问题,考虑到现实的计算资源限制和处理速度要求,结合分形聚类、自适应采样技术与Chernoff不等式,提出了数据流聚类演变实时追踪算法,进行聚类演变的自动追踪;通过合成与实际数据集上的实验工作验证了算法的有效性。
Stream data can often show important changes in trends over time. In such cases, it is useful to understand, visualize and diagnose the evolution of these trends. When the data streams are fast and continuous, it becomes important to analyze and predict the trends quickly in online fashion. This paper discusses the real-time clustering evolution tracking for data stream algorithm which integrates the fractal cluster technique, self-adaptive sampline technique with the restriction of computing resource and the requirement of processing speed, and can discriminate the cluster evolution of stream data on time. The experiments over a number of real and synthetic data sets illustrate the effectiveness and efficiency provided by this approach.
出处
《计算机科学与探索》
CSCD
2010年第7期662-672,共11页
Journal of Frontiers of Computer Science and Technology
基金
新世纪优秀人才支持计划No.NCET-10-0017
兰州市科技计划项目No.2008-1-28~~
关键词
数据挖掘
聚类进化
分形
自适应采样
data mining
cluster evolution
fractal
self-adaptive sampling