期刊文献+

基于MapReduce的top-k高效用模式挖掘算法 被引量:7

Top-k high utility pattern mining algorithm based on MapReduce
下载PDF
导出
摘要 高效用模式挖掘被广泛地应用于数据挖掘领域。为了挖掘指定数量的高效用模式,一些基于树结构和效用表结构的top-k高效用挖掘算法被提出,但前者在挖掘过程中产生了大量候选模式,后者在效用模式增长时需要进行多次比较;同时,由于在信息社会,数据量呈爆炸性增长,所以在数据集过大的情况下,挖掘高效用模式需以大量存储空间以及计算开销为代价。为了解决这两个问题,基于MapReduce的top-k高效用模式挖掘算法(TKHUP_MaR)被提出。该算法通过两次扫描数据库,利用三次MapReduce来实现并行top-k高效用模式的挖掘。通过实验表明TKHUP_MaR算法在并行挖掘top-k高效用模式的过程中是有效的。 High utility pattern mining has been widely applied in the field of data mining. Some top-k high utility pattern mining algorithms based on tree-like and list-like structures were proposed. However, tree-like algorithms generated a large number of candidates, and comparing operation was costly during the process of utility pattern growth in list-like algorithms. In addition, the amount of information data increased exponentially in information society. Thus, it required memory usage and computational cost in mining process, especially the dataset size was huge. In order to address above issues, this paper proposed top-k high utility pattern mining algorithm based on MapReduee, called TKHUP_MaR. TKHUP_MaR needed to scan database twice and used three MapReduce phases to parallelize top-k high utility pattern mining. The experiment results show that TKHUP_MaR is effective in the process of mining top-k high utility patterns on parallel environment.
出处 《计算机应用研究》 CSCD 北大核心 2017年第10期2897-2900,2932,共5页 Application Research of Computers
基金 国家自然科学基金资助项目(61370108)
关键词 数据挖掘 TOP-K 高效用模式 MAPREDUCE 并行算法 data mining top-k high utility pattern MapReduce parallel algorithm
  • 相关文献

参考文献3

二级参考文献27

  • 1HAN Jia-wei, CHENG Hong, XIN Dong, et al. Frequent pattern mi- ning: current status and future directions [J]. Data Mining and Knowledge Discovery,2007,15( 1 ) :55-86. 被引量:1
  • 2AGRAWALR,IMIELISKIT,SWAMIA.Miningassociationrulesbetweensetsofitemsinlargedatabases[J].ACM SIGMOD Record,1993,22(2):207-216. 被引量:1
  • 3HANJiawei,PEIJian,YINYiwen.Miningfrequentpatternswithoutcandidategeneration[J].ACMSIGMODRecord,2000,29(2):1-12. 被引量:1
  • 4ZA?ANEOR,ELHAJJM,LUP.Fastparallelassociationruleminingwithoutcandidacygeneration[C]//ProcofIEEE International ConferenceonDataMining.2001:665-668. 被引量:1
  • 5PRAMUDIONOI,KITSUREGAWA M.ParallelFPgrowthonPCcluster[C]//Procofthe7thPacificAsiaConferenceonAdvancesinKnowledgeDiscoveryandDataMining.Berlin: SpringerVerlag,2003:467-473. 被引量:1
  • 6LILi,ZHAIDong,JINFan.Aparallelalgorithmforfrequentitemsetmining[C]//Procofthe4thInternationalConferenceonParallelandDistributedComputing,ApplicationsandTechnologies.2003:868-871. 被引量:1
  • 7DEANJ,GHEMAWATS.MapReduce:simplifieddataprocessingonlargeclusters[J].CommunicationsoftheACM,2008,51(1):107-113. 被引量:1
  • 8LIHaoyuan,WANGYi,ZHANGDong,etal.PFP:parallelFPGrowthforqueryrecommendation[C]//ProcofACM ConferenceonRecommenderSystems.2008:107-114. 被引量:1
  • 9OWENS,ANILR,DUNNINGT,etal.Mahoutinaction[M].[S.l.]:ManningPublications,2011. 被引量:1
  • 10WANGSuqi,YANGYubin,CHENGuangpeng,etal.MapReducebasedclosedfrequentitemsetminingwithefficientredundancyfiltering[C]//Procofthe12thIEEEInternationalConferenceonDataMiningWorkshops.2012:449-453. 被引量:1

共引文献34

同被引文献25

引证文献7

二级引证文献52

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部