一种稳定的并行分布式频繁集挖掘算法及其应用

A STABLE PARALLEL DISTRIBUTED FREQUENT ITEMSET MINING ALGORITHM AND ITS APPLICATION

下载PDF

导出

摘要为解决大规模医药数据分析中的频繁集挖掘问题,提出一种稳定且具有良好扩展性的并行分布式算法P-FIM。该算法将挖掘任务分割成无相互依赖关系的同构子任务,实现有效的并行计算;并且充分利用Map/Reduce框架和集群环境的优势提高自身的鲁棒性和负载均衡能力。采用最大规模为512万条记录的中医药方剂数据进行算法性能分析实验,其结果表明,该算法在分布式集群环境中表现稳定,而且随着集群规模的增加其加速比接近线性。以P-FIM算法为基础设计实现的中医药数据相关性分析方案,可有效地从大规模临床数据中获得全面、可靠的病、症、药间相关性的信息。 This paper proposes P-FIM,a stable parallel distributed algorithm with good scalability,to deal with frequent itemset mining issue in large scale medicine data analysis.It divides the mining task into independent isomorphic subtasks to achieve effective parallel computation,and takes full advantage of Map/Reduce infrastructure as well as computing cluster to improve its own robustness and load balance capability.In this paper we carry out analytical experiment on performance of the P-FIM algorithm based on TCM prescription data that contain largest records up to 51.2million.The result shows that the algorithm performs stably in distributed clustering condition,and approaches linear speedup along with the augment of clustering scale.The correlation analysis scheme of traditional Chinese medicine designed and implemented based on P-FIM algorithm can effectively gain comprehensive and reliable information correlating with the disease,symptoms and medicine from large scale clinical data.

作者秘中凯姜晓红雷蕾

机构地区浙江大学计算机科学与技术学院中国中医科学院中医药信息研究所

出处《计算机应用与软件》 CSCD 2011年第3期83-85,124,共4页 Computer Applications and Software

基金国家高技术研究发展计划项目(2006AA01A123) 杰出青年基金(NSFC60525202)

关键词数据挖掘频繁集挖掘 Map/Reduce并行框架医药数据分析 Data mining Frequent itemset mining Map/Reduce parallel infrastructure Analysis of medicine data

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献6

1Agrawal R ,Srikant R. Fast algorithms for mining association rules. Santiago Chile : Very Large Data Bases ( VLDB' 94) :487 - 499. 被引量：1
2Dean J, Ghemawat S. MapReduee:Simplified data processing on large clusters [ C ]//Proc. of the 6th OSDI ( Dec. 2004 ) : 137 - 150. 被引量：1
3Agrawal R, Sharer J C. Parallel mining of association rules. IEEE Transaction On Knowledge And Data Engineering. 1996(8) :962 -969. 被引量：1
4Ye Y, Chiang C C. A parallel apriori algorithm for frequent itemsets mining[ C ]//SERA ,2006. 被引量：1
5Liu Li, Li Eric, Zhang Yimin. Optimization of frequent itemset mining on multiple-core processor[ C]//VLDB ,2007. 被引量：1
6LI H, Wang Y,Zhang D, et al. PFP: Parallel FP-Growth for Query Recommendation. ACM Recommender Systems,2008. 被引量：1

1娄兰芳,潘庆先.基于集合运算的频繁集挖掘优化算法[J].山东大学学报（理学版）,2008,43(11):54-57. 被引量：1
2杨妮妮.基于集合和位运算的频繁集挖掘优化算法[J].科学技术与工程,2009,9(23):7173-7175. 被引量：1
3陈晓云.一种带约束条件的关联规则频繁集挖掘[J].计算机工程与应用,2003,39(2):205-208. 被引量：4
4温磊,李敏强.基于有向项集图的频繁集挖掘优化算法[J].计算机工程,2003,29(22):111-113.
5徐利军,谢康林,徐虹.基于数据流的频繁集挖掘[J].上海交通大学学报,2006,40(3):502-506. 被引量：5
6张月琴.基于0-1矩阵的频繁项集挖掘算法研究[J].计算机工程与设计,2009,30(20):4662-4664. 被引量：8
7王波,钱晓棠,张斌,张明卫.基于连接的频繁集聚类算法[J].辽宁工程技术大学学报（自然科学版）,2005,24(z2):150-152.
8黄剑,李明奇,郭文强.并行Fp-growth算法在搜索引擎中的应用[J].计算机科学,2015,42(S1):459-461 483. 被引量：2
9谢廷婷.频繁集挖掘算法研究[J].计算机与现代化,2007(3):60-63. 被引量：2
10刘琦,卜佳俊,陈纯.基于Apriori算法的关键词推荐在面向主题的用户个性化搜索中的应用[J].模式识别与人工智能,2006,19(2):186-190. 被引量：5

计算机应用与软件

2011年第3期

浏览历史

内容加载中请稍等...

一种稳定的并行分布式频繁集挖掘算法及其应用

参考文献6

相关作者

相关机构

相关主题

浏览历史