摘要
针对传统FP-Growth算法在大规模数据环境下存在的挖掘效率低和内存溢出问题,在传统FP-Growth算法的基础上,提出一种新的并行FP-Growth算法,并在分布式计算框架Hadoop的MapReduce编程模式下实现并行化处理。实验数据表明,并行的FP-Growth算法与传统的FPGrowth算法相比,具有相同数据量下计算时间短,相同时间内处理数据量增大的优点,并在一定条件下解决了大数据挖掘的内存溢出问题。
Aiming at the low mining efficiency and memory overflow problems of the traditional FP-Growth algorithm,on the basis of the traditional FP-Growth algorithm,a novel parallel FP-Growth algorithm is proposed,which can realize parallel processing in MapReduce programming mode of Hadoop distributed computing framework.The tested data shows that compared to the traditional algorithm,the parallel FP-Growth algorithm has great advantages:the calculation time is greatly reduced when processing the same amount of data;processed data volume is greatly increased under the same time;and memory overflow problem in large scale data mining is solved under certain conditions.
出处
《河北工业科技》
CAS
2016年第2期169-177,共9页
Hebei Journal of Industrial Science and Technology