摘要
Weka4WS采用WSRF技术用于执行远程的数据挖掘和管理分布式计算,支持分布式数据挖掘任务。基于Weka4WS和网格环境,尝试了一种新的分布式聚类方法,并成功地将其嵌入到Weka4WS框架中,借助We-kaLibrary实现分布式数据挖掘算法,同时引入了距离代价和混合概率的概念,将网格与Web服务技术融合,以分布式问题求解环境和开源数据挖掘类库Weka为底层支持环境,构建了网格环境下面向服务的分布式数据挖掘体系,并以基于Weka4WS的分布式聚类算法验证了算法的有效性和体系结构的可行性。
Weka4WS adopts the WSRF technology for running remote data mining algorithms and managing distributed computations,a WSRF-compliant Web service is used to expose all the data mining algorithms provided by the Weka Library.This paper described Weka4WS,a framework that extended the widely used open source Weka toolkit to support distributed data mining on WSRF-enabled grids and had a try at solving the problem of distributed clustering,in addition,introduced the concepts of distance-cost and admixture probability,and achieved the distributed clustering algorithm by dint of Weka Library,designed a distributed data mining architecture oriented-services in grid environment combining grid with Web services,the implementation of Weka4WS using the WSRF libraries and services provided by Globus Toolkit 4.Finally it validated the validity of the algorithm and the feasibility of the architecture with the distributed clustering based on Weka4WS.
出处
《计算机应用研究》
CSCD
北大核心
2010年第11期4072-4075,共4页
Application Research of Computers
基金
国家"863"计划资助项目(2007AA01Z126)
总装武器装备预研基金资助项目(9140A06050409JB8102)
关键词
网格
分布式
聚类
数据挖掘
grid
distributed
clustering
data mining