期刊文献+

Hadoop-MapReduce下的PageRank矩阵分块算法 被引量:13

PageRank Matrix Partitioned Algorithm Using Hadoop-MapReduce
下载PDF
导出
摘要 PageRank是Web结构挖掘的经典算法,已在Google搜索引擎中取得了巨大成功。但其迭代次数多,时空消耗大,执行速度和收敛速度都还较慢。文中详细讨论了Hadoop-MapReduce的执行流程及其内部实现机制后,提出了一种并行MapReduce实现矩阵分块的PageRank算法,其实质是减少MapReduce框架结构中Map阶段和Reduce阶段的迭代次数,从而减少时空开销。最后搭建Hadoop-MapReduce开源平台,模拟Web结构爬取,比较了传统算法和改进算法的性能。结果表明,改进后的算法迭代次数低,并行效率较高,在模拟环境中PageRank标识网页等级显示其优越性。 PageRank is the classical algorithm of Web structure mining,already has been a huge success in Google search engine.But the more iterative times,the more space-time consumption,execution speed and convergence speed are slower.Put forward a kind of parallel MapReduce framework,realize matrix partition using PageRank algorithm,as a matter of fact substance is the iterations of reducing MapReduce frame structure in Map and Reduce phase,thus reducing space-time overhead.Finally build Hadoop-MapReduce open-source platform,simulate Web structure climb taking,the performance in traditional algorithm and improved algorithm is compared.Results show the improved algorithm has lower iteration times,higher parallel efficiency,using PageRank identification shows its superiority in the simulation environment.
出处 《计算机技术与发展》 2011年第8期6-9,13,共5页 Computer Technology and Development
基金 云南省自然科学基金(2007F174M) 云南大学研究生科研课题资助项目(ynny200928)
关键词 PAGERANK MAPREDUCE HADOOP 矩阵分块 PageRank MapReduce Hadoop partitioned matrix
  • 相关文献

参考文献12

  • 1焦金涛.基于PageRank的Web挖掘改进算法[J].计算机工程,2009,35(15):284-284. 被引量:10
  • 2胡彧,封俊.Hadoop下的分布式搜索引擎[J].计算机系统应用,2010,19(7):224-228. 被引量:15
  • 3Dean J, Ghemawat S. MapReduce: Simplied Data Proessing on Large Clusters[ C] JJProceedings oi the 6th Conference on Symposium on Operating Systems Design & Implementation. [ s. 1. ] : USENIX Association, 2004. 被引量:1
  • 4Catanzaro B C, Sundaram N, Keutzer K. A Map Reduce Framework for Programming Graphics Processors [ C ]//Work- shop on Software Tools for MultiCore. [s. l. ]: Is. n. ] ,2006. 被引量:1
  • 5Ranger C, Raghuraman R, Penmetsa A, et al. Evaluating MapReduce for Multi-core and Multi processor Systems [ C ]//HPCA. [s. l. ] :[s. n. ] ,2007:13-24. 被引量:1
  • 6郑启龙,房明,汪胜,王向前,吴晓伟,王昊.基于MapReduce模型的并行科学计算[J].微电子学与计算机,2009,26(8):13-17. 被引量:39
  • 7史佩昌,王怀民,蒋杰,卢凯.面向云计算的网络化平台研究与实现[J].计算机工程与科学,2009,31(A01):249-252. 被引量:57
  • 8Sarje A, Aluru S. A MapReduce Style Framework for Trees [R]. [ s. 1. ]:Department of Electrical and Computer Engineering, 2008 : 17-18. 被引量:1
  • 9Hadoop. The Apache Software Foundation[ EB/OL]. 2010. http://hadoop, apache, org/core. 被引量:1
  • 10Bialecki A, Cafarella M, Cutting D, et al. Hadoop : a framework for running applications on large clusters built of commodity hardware [ EB/OL ]. 2005. http://lucene, apache. org/hadoop. 被引量:1

二级参考文献36

  • 1李盛恩,王珊.封闭数据立方体技术研究[J].软件学报,2004,15(8):1165-1171. 被引量:25
  • 2张蓉.Web挖掘技术研究[J].计算机工程,2006,32(15):4-6. 被引量:21
  • 3卢锡城,王怀民,王戟.虚拟计算环境iVCE:概念与体系结构[J].中国科学(E辑),2006,36(10):1081-1099. 被引量:37
  • 4Gray J, Chaudhuri S, Bosworth A, et al. Data cube : a relational aggregation operator generalizing group-by, crosstab, and sub-totals [ J]. Data Mining and Knowledge Discovery, 1997,1 ( 1 ) :29-53. 被引量:1
  • 5Lakshmanan L V S, Pei J, Han J W. Quotient cubes:how to summarize the semantics of a data cube [ C ]//Proceedings of the 28th International Conference .on Very Large Data Bases. Hong Kong: [ s. n. ] ,2002:778-789. 被引量:1
  • 6Lakshmanan L V S, Pei J, Zhao Y. QC-trees:an efficient summary structure for semantic OLAP [ C ]//Proceedings of ACM SIGMOD International Conference on Management of Data. San Diego:ACM,2003:64-75. 被引量:1
  • 7Beyer K, Ramakrishnan R. Bottom-up computation of sparse and iceberg CUBEs [C] //Proceedings of ACM SIGMOD International Conference on Management of Data. New York:ACM, 1999:359-370. 被引量:1
  • 8Xin D,Shao Z,Han J W,et al. C-Cubing:efficient computation of closed cubes by aggregation-based checking [ C ]// Proceedings of the 22nd International Conference on Data Engineering. Atlanta : IEEE, 2006:4 -4. 被引量:1
  • 9Chen Y, Dehne F, Eavis T. Parallel ROLAP data cube construction on shared-nothing muhiprocessors [ J ]. Distributed and Parallel Databases ,2004,15 ( 3 ) :219-236. 被引量:1
  • 10Sarawagi S, Agrawal R, Gupta A. On computing the data cube [R]. San Jose: IBM Almaden Research Center, 1996. 被引量:1

共引文献139

同被引文献152

  • 1吕明育,李小勇.NoSQL数据库与关系数据库的比较分析[J].微型电脑应用,2011(10):55-58. 被引量:21
  • 2任年海.一个有效的并行模型——BSP并行模型[J].计算机与现代化,2006(3):34-36. 被引量:3
  • 3Han Jiawei,Kamber Micheline,范明,孟小峰,等译.数据挖掘概念与技术[M].北京:机械工业出版社,2007:424-479. 被引量:43
  • 4WHITET.Hadoop权威指南[M].北京:清华大学出版社.2010.5. 被引量:16
  • 5王小平 曹立明.遗传算法[M].西安:西安交通大学出版社,2002.. 被引量:107
  • 6丁辉,张大华,罗志明.基于Hadoop的海量数据处理平台研究[C]//2011电力通信管理暨智能电网通信技术论坛论文集.出版地不祥:出版者不详,2011. 被引量:1
  • 7Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Clusters [ J ]. Communications of the ACM, 2008,51 (1) :107-113. 被引量:1
  • 8夏袜.Hadoop平台下的作业调度算法研究与改进[D].广州:华南理工大学,2010. 被引量:2
  • 9Holland J H. Adaptation in Natural and Artificial System[ M].Ann Arbor, MI : University of Michigan Press, 1975. 被引量:1
  • 10Jin C, Vecchiola C, Buyya R. Mrpga : An extension of mapre- duce for parallelizing genetic algorithms [ C ]//IEEE Fourth International Conference on eScience. [ s. 1.] : [ s. n. ] ,2008: 214-221. 被引量:1

引证文献13

二级引证文献119

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部