摘要
为提高数据挖掘效率,提出了一种基于分布式的频繁闭合模式挖掘算法——PFCI-Miner.该算法采用任务分布的主从方式,其中主处理器通过发送提出的前缀路径表(PrePthx)将挖掘任务合理划分,而从处理器借助提出的存储树(Trac-tree)挖掘局部频繁闭合模式,最后由主处理器挖掘出全局频繁闭合模式.此外,采用星形拓扑结构,使数据通信只存在于主处理器与从处理器之间,而各从处理器之间无数据通信且不需要同步.在由3台PC机构成的分布式环境下,对合成与蘑菇数据集的实验表明,PFCI-Miner较DP-FP算法、AFCIM算法和DFCIM算法的执行效率分别平均提高了43.66%、42.17%、53.48%和51.86%、47.62%、62.78%.
In order to improve the mining efficiency, an algorithm, PFCI-Miner, based on distributed frequent closed patterns mining was proposed. This algorithm adopts a master-slave structure to implement task distribution. The master processor assigns a task efficiently by sending a proposed prefix path table (PrePthx), and the slave processors mine local frequent closed patterns with the help of a proposed store tree (Trac-tree). Finally the master processor mines the global frequent closed patterns. The algorithm uses star-like topology in order to make all data communications only between the master processor and the slave processors, there being no communication and no synchronization among all slave processors. Computer simulation on synthesis and mushroom data sets under the distribution of 3 PC computers shows that compared with the DP-FP algorithm, the AFCIM (adaptive frequent closed itemsets mining model) algorithm and the DFCIM (distributed frequent closed itemsets mining) algorithm, the PFCI-Miner algorithm has, on average, 43.66%, 42.17%, 53.48% and 51.86%, 47.62%, 62.78% improvements in the efficiency respectively.
出处
《西南交通大学学报》
EI
CSCD
北大核心
2012年第6期1027-1033,共7页
Journal of Southwest Jiaotong University
基金
陕西省自然科学基金资助项目(2009JM7007)
关键词
关联规则
数据挖掘
频繁闭合模式
association rule data mining frequent closed pattern