摘要
云计算环境下的大数据特征挖掘是大数据统计及分析的基础。为了提高聚类的准确度和速度,设计了一种基于分布式Hadoop平台和熵加权特征选择的数据挖掘方案。该方案首先采用无回路有向图对Hadoop平台下的Map Reduce作业流调度问题进行了分析。然后采用并行Map Reduce执行过程完成分布式计算。最后,采用熵加权聚类算法实现海量数据挖掘。仿真结果显示,提出的数据挖掘方案具有较好聚类效果和运行效率。
Big data feature mining in cloud computing environment is the basis for big data statistics and analysis. In order to improve the accuracy and speed of clustering,a data mining scheme based on distributed Hadoop platform and entropy weighted feature selection was designed in this paper.This scheme firstly uses the no-loop directed graph to analyze the problem of Map Reduce job stream scheduling under Hadoop platform,and then uses the parallel Map Reduce execution to complete the distributed computing.Finally,massive data mining is implemented by using the entropy weighted clustering algorithm.Simulation results show that the proposed data mining scheme has good clustering effect and operation efficiency.
作者
何婕
赖敏
Jie HE;Min LAI(College of Electronic Information Engineering,Chongqing Radio and Television University ,ChongQing 401520,China;Chongqing Institute of Engineenng,College of Software Engineering&Computer Science,Chongqing 401320,China)
出处
《机床与液压》
北大核心
2018年第24期144-149,共6页
Machine Tool & Hydraulics
基金
Chongqing Science and Technology Research Project of the Education Commission(KJ1737458)~~