摘要
文章研究和分析了数据流上的K-median聚类算法技术,包括:(1)流模型和K-median问题定义;(2)基于流的K-median聚类基本决策和内在机理;(3)理论上有性能保证的流算法。对于每一特征,这种技术能在没有实际保留任何数据流对象的情形下有效地确定聚类点。它通过一个聚类块的一分为二或相邻聚类块的合二为一来动态地生成聚类点,从而实现上述目标。作为结果,这种技术所确定的聚类点将比其他常规方法更准确。在数据流环境中,这种技术能够在产生高质量聚类结果的同时非常有效地执行。
K-median Technique that employs clustering algorithms for a data stream is studied and analyzed here, including: (1) the definition of stream model and k-median problem;(2) the fundamental decisions and inner mechanism of k-median clustering on streams;(3) streaming algorithms with theoretical performance guarantees. For each feature, its clusters can be effectively found upon without maintaining any object of the data stream physically. For the purpose, clusters are dynamically generated by splitting a cluster into two clusters or merging two adjacent clusters into one cluster. As a result, the studied technique can find clusters more correctly than other conventional methods. It can perform very efficiently in the data stream environment while producing clustering results of very high quality.
出处
《微电子学与计算机》
CSCD
北大核心
2006年第z1期190-192,共3页
Microelectronics & Computer
基金
福建省自然科学基金项目(A0410011)
福建省科技专项经费项目(2005K007)