期刊文献+

基于Hadoop云平台的并行数据挖掘方法 被引量:38

Parallel Approach in Data Mining Based on Hadoop Cloud Platform
下载PDF
导出
摘要 业界已经开始运用云平台来处理海量高维数据,将各种异构系统仿真为一个系统,其中在Hadoop环境进行数据挖掘会遇到数据模型的全局性、HDFS的文件随机写操作、数据生命周期短等问题。为解决这些问题,在Hadoop上实现高效海量数据挖掘,提出了在Hadoop上一种高效数据挖掘框架,利用数据库来模拟链表结构,管理挖掘出来的知识,提供了树形结构、图模型的分布式计算方法;在此基础上实现一个统计算法——Yscore分箱算法,以及决策树和KD树的建树算法;并利用Vega云对Hadoop集群进行仿真。实验数据表明该框架和算法实用可行,且可能拓展与数据挖掘之外的其他领域。 The cloud platform has been dealt in industry with large-scale high-dimensional data. A variety of heterogeneous systems have been simulated as one system, in which data mining on Hadoop will encounter the issues, such as the globalization of data models, the random write operations of HDFS files, and the duration of data life. For practical large-scale high-dimensional data mining, an efficient data mining framework on Hadoop was proposed to solve these problems, which used databases to simulate the linked list structure, and provided a distributed algorithm for structures of tree and graph model. Based on it, a statistical algorithm-Yscore binning - was proposed, as well as the DB-tree and KD-tree building algorithm. The Vega cloud was used as a simulation of Hadoop cluster. The experimental data shows that the framework and the algorithm is practical and feasible, and may be expanded to other areas outside of data mining.
出处 《系统仿真学报》 CAS CSCD 北大核心 2013年第5期936-944,共9页 Journal of System Simulation
基金 国家自然科学基金(61035003 61072085 61202212 60933004) 国家973项目(2013CB329502) 国家863高技术研究发展计划课题(2012AA011003) 国家科技支撑计划(2012BA107B02)
关键词 并行数据挖掘 决策树算法 KD树算法 JPA 云计算 parallel data mining DB-tree KD-tree JPA cloud computing
  • 相关文献

参考文献18

  • 1Bohm C, Berchtold S, Kriagel H P, et al. Mul-tidimensional index structures in relational databases [C]// Proceedings of the 1st International Conference on Data Warehousing and Knowledge Discovery (DaWak 99), Florence, Italy, F Aug 30-Sep 01, 1999. 被引量:1
  • 2Dean J, Ghemawat S, Usenix. MapReduce: Sim-plified data processing on large clusters [C]// Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI 04), San Francisco, CA, F Dec 06-08, 2004. 被引量:1
  • 3李伯虎,柴旭东,侯宝存,李潭,张雅彬,余海燕,韩军,邸彦强,黄继杰,宋长峰,唐震,王鹏,施国强,王晓华.一种基于云计算理念的网络化建模与仿真平台——“云仿真平台”[J].系统仿真学报,2009,21(17):5292-5299. 被引量:125
  • 4李伯虎,张霖,王时龙,陶飞,曹军威,姜晓丹,宋晓,柴旭东.云制造——面向服务的网络化制造新模式[J].计算机集成制造系统,2010,16(1):1-7. 被引量:852
  • 5华翔,康凤举,田学伟,王定华.可视化仿真的私有云框架研究[J].系统仿真学报,2011,23(8):1652-1656. 被引量:9
  • 6黄安祥,冯晓文,李劲松,禹海全.基于云计算平台的航空兵训练仿真体系结构[J].系统仿真学报,2011,23(B07):106-109. 被引量:9
  • 7Mccreadie R M C, Macdonald C, Ounis I. On Single-Pass Indexing with MapReduce [M]. New York, USA: Assoc Computing Machinery, 2009. 被引量:1
  • 8Lammel R. Google's MapReduce programming model - Revisited [J]. Science of Computer Programming (S0167-6423), 2008, 70(1): 1-30. 被引量:1
  • 9Moretti C, Steinhaeuser K, Thain D, et al. Scaling Up Classifiers to Cloud Computers [C]// Proceedings of the IEEE International Conference on Data Mining, Pisa, Italy, F, 2008. USA: IEEE Computer Society, 2008. 被引量:1
  • 10Gillick D, Faria A, Denero J. MapReduce: Dis-tributed Computing for Machine Learning [M/OL] (2006) [2011-07]. http://www.icsi.berkeley.edu/-arlo/publications/gillick_cs262a _proj.pdf. 被引量:1

二级参考文献42

  • 1施国强,朱耀琴,李伯虎,柴旭东.复杂虚拟样机工程的项目管理技术研究[J].系统仿真学报,2005,17(8):1905-1908. 被引量:3
  • 2李伯虎.现代建模/仿真技术现状与发展[C]//第五届中国科学家论坛,2006. 被引量:2
  • 3Bo Hu Li, Xudong Chai, Baocun Hou, et al. Research and Application on CoSim (Collaborative Simulation) Grid [C]// The Proceeding of MS-MTSA'06. Canada: SCSC, 2006. 被引量:1
  • 4President's Information Technology Advisory Committee. Computational Science: Ensuring America's Competitiveness [R]// Report to the President. USA: National Coordination Office for Information Technology Research & Development, 2005:10-13. 被引量:1
  • 5Michael Armbrust, Armando Fox, Rean Griffith, et aL Above the Clouds: A Benkeley View of Cloud Computing [DB/OL]. (2009-2-10) [2009-6-12]. http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.p df. 被引量:1
  • 6DMSO. High-Level Architecture Rules [Z]. Version 1.3.1998. 被引量:1
  • 7Gemini Corp. OpenGVS Programming Guide [Z]. USA: Gemini Corp 1998. 被引量:1
  • 8Guangyou Xu, Yuanchtm Shi. Pervasive Compute [J]. Computer Learned Journal, 2003, 26(9): 1042-1050(in Chinese). 被引量:1
  • 9Guoqiang SHI, Yao Qin ZHU, Bo Hu LI, et al. Project Management Technology of Complex Virtual Prototype Engineering [J]. Journal of System Simulation (S 1004-731X), 2005, 17(8): 1905-1908 (in Chinese). 被引量:1
  • 10Donald Brutzman, Michael Zyda, J Mark PuUen, et al. Morse: Extensible Modeling and Simulation Framework (XMSF): Challenges for Web- Based Modeling and Simulation, Findings and Recommendations Report of the XMSF Technical Challenges Workshop and Strategic Opportunities Symposium [EB/OL]. (2002-10) [2006-3-12]. http://www.movesinstitute.org/xmsf/Xms fW rtOctober2002.pdf. 被引量:1

共引文献971

同被引文献298

引证文献38

二级引证文献213

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部