摘要
云计算可按软件即服务(Saa S)的形式提供数据挖掘的结果。数据挖掘的性能和质量是云计算环境下数据挖掘应用的重要使用标准。文中提出一种基于云计算的数据挖掘应用及其数据集的分布和调度框架,该框架实现了基于云计算的K均值聚类方法,并将其作为云软件即服务(Saa S)来提供给用户,其主要目标是降低应用的总体运行时间,将挖掘质量的损失最小化。仿真结果表明,相比于已有方案,其方案在速度获得显著提升的同时,挖掘质量损失最小。另外,当聚类数量和数据集的规模上升时,挖掘质量也具有良好的扩展性,可促进本文方案在云服务提供商中的应用。
Cloud computing can provide data mining results in the form of Software as a Service(Saa S). Both performance and quality of data mining are the fundamental criteria of data mining application in the cloud computing environment. This paper proposes a data mining application based on cloud computing and the framework for its distribution and scheduling of data sets.The framework implements the K mean clustering method based on cloud computing, and provides itself with the users as the cloud Saa S. Its main purpose is to reduce the whole execution time of the application and to minimize the loss of quality of mining.The simulation results show that, compared with the existing scheme, the scheme proposed in this paper can minimize the loss of quality of mining while the speed is significantly improved. In addition, it has good scalability of quality of mining when the amounts of cluster and scale of data sets both increase. It can promote this paper's program application in cloud service provider.
出处
《微型电脑应用》
2015年第6期15-19,共5页
Microcomputer Applications
关键词
云计算
数据挖掘
K均值聚类
总体运行时间
Cloud Computing
Data Ming
K Mean Clustering
Overall Execution Time