实现可靠计算的容错网格结构被引量：7

Study of a Fault-tolerant Grid Framework for Dependable Grid Computing

下载PDF

导出

摘要由于网格资源的分布性、流动性和异构性,计算故障在网格计算环境中发生的概率比传统机群系统要高,而且结点故障的发生具有不确定性,检测和恢复更加困难。为了在网格计算环境中实现应用程序的可靠执行,提出了一种基于分布式错误检测技术的容错网格体系结构,研究了在结点故障、网络故障和进程故障时,应用程序恢复执行的方法。针对网格环境下上述三种故障发生的特性,研究了不同的应用程序恢复执行机制,其目标是以较小代价获得应用的可靠执行。 For the distribution, variability and heterogeneity of Grid resources, the faults probability in grid is much higher than in cluster systems, especially, for the uncertainty of nodes fault, it＇s more difficult for faults detection and recovery. In this paper, we study the techniques of fault-tolerance in grid computing environment and propose a faulttolerant grid architecture. Based on the HBM in Globus, we describe faults detection and recovery of network, grid node and processes, and establish the fault-tolerant grid structure oriented parallel computing. Using these strategies, users can recover or adjust computing with small cost and high performance.

作者邱敏桂小林

机构地区西安交通大学计算机科学与技术系

出处《微电子学与计算机》 CSCD 北大核心 2005年第7期99-102,106,共5页 Microelectronics & Computer

基金国家自然科学基金项目(60273085) 国家863计划项目(2001AA111081) 教育部ChinaGrid计划项目

关键词容错计算网格计算可靠性错误检测故障恢复 Fault-tolerance computing, Grid computing, Reliability, Fault detection, Fault recovery

分类号 TP393.02 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献13

1查礼,徐志伟,林国璋,刘玉树,刘东华,李伟.基于LDAP的网格监控系统[J].计算机研究与发展,2002,39(8):930-936. 被引量：49
2D K Pradhan. Fault-Tolerant Computing: Theory and Techniques. Prentice Hall, 1995, 1. 被引量：1
3Sun Sup So, Sung Deok Cha, Timothy J Shimeall, Yong Rae Kwon. An Empirical Evaluation of Six Methods to Detect Faults in Software. Jounal of Software Testing, Verification, and Reliability, May 2002, 12. 被引量：1
4Paul Stelling, Ian Foster, Carl Kesselman, Craig Lee, Gregorvon Laszewski. A Fault Detection Service For Wide Area Distributed Computations. In proceedings of the 7th IEEE syrup on high performance distributed computing. 被引量：1
5Franck Cappello, Samir Djilali, Gilles Fedak, Vincent N'eft, Thomas H'erault. RPC_V: Toward on Fault Tolerant RPC Design for the Grid with Volatile Nodes. Conferenceon high performance networking and computing. Proceedings of the ACM/IEEE, 2002. 被引量：1
6Barbara B Simons, Alfred Z Spectot. Fault Tolerant Distributed Computing, Lecture Notes in Computer Science 448 Springer, 1990. 被引量：1
7Tevfik Kosar, George Kola, Miron Livny. A Framework for Self-optimizing, Fault-tolerant, High Performance Bulk Data Transfers in a Heterogeneous Grid Environment, Parallel and Distributed Computing. In proceedings on second international symposium, 2003. 被引量：1
8.[EB/OL].www.globus.org/hbm,. 被引量：1
9Olivia Das, C M Woodside. Failure Detection and Recovery Modelling For Multi-layered Service Systems. Fifth international workshop on performability modeling of computer and communication systems. 被引量：1
10Priya Narasimhan, Austin Fath, Chuck Fox, et al. A Distributed Fault-tolerant Architecture. www.ece.cmu.edu. 被引量：1

二级参考文献9

1[1]L Smarr, C Catlett. Metacomputing. Communications of the ACM, 1992, 35(6): 44～52 被引量：1
2[2]Ian Foster, Carl Kesselman. The grid: Blueprint for a new computing infrastructure. San Francisco, CA: Morgan Kaufmann, 1999 被引量：1
3[3]J Case, R Mundy, D Partain et al. Introduction to version 3 of the Internet-standard network management framework. IETF, RFC 2570, 1999. http://www.ietf.org/rfc/rfc2570.txt 被引量：1
4[4]Rajkumar Buyya. PARMON: A portable and scalable monitoring system for clusters. Software-Practice and Experience, 2000, 30(7): 723～739 被引量：1
5[5]P Uthayopas, S Phaisithbenchapol, K Chongbarirux. Building a resources monitoring system for SMILE Beowulf cluster. In: Proc of High Performance Computing, Asia'99. Singapore, 1999. http://prg.cpe.ku.ac.th/publications/hpcasia.pdf 被引量：1
6[6]W Yeong, T Howes, S Kille. Lightweight directory access protocol. RFC 1777, 1995. http://www.ietf.org/rfc/rfc1777.txt 被引量：1
7[7]S Fitzgerald, I Foster, C Kesselman et al. A directory service for configuring high-performance distributed computations. In: The 6th IEEE Int'l Symp on High Performance Distributed Computing. Portland, U S, 1997. 365～375 被引量：1
8[8]B Tierney, B Crowley, D Gunter et al. A monitoring sensor management system for grid environment. In: The 9th Int'l Symp on High Performance Distributed Computing (HPDC-9 2000). Pittsburgh, Pennsylvania, 2000. 97～104 被引量：1
9[9]Brian Tierney, Ruth Aydt, Dan Gunter et al. A Grid monitoring architecture. In: Performance Working Group of Grid Forum, 2001. http://www-didc.lbl.gov/GGF-PERF/GMA-WG/papers/GWD-GP-16-1.pdf 被引量：1

共引文献48

1才悦,杨雨浓.黄石高速公路全程高清视频改造探讨[J].中国交通信息化,2022(S01):287-289.
2王宁,张铭,李晓明.基于分布式散列表的网格监控系统[J].华中科技大学学报（自然科学版）,2006,34(z1):112-115.
3李一琦,董守玲,张凌.具有容错机制的网格监控系统[J].华中科技大学学报（自然科学版）,2006,34(z1):164-166.
4王静宇,谭跃生,张晓琳.基于Agent的网格资源监控系统的设计与实现[J].微电子学与计算机,2006,23(z1):181-182. 被引量：3
5蔡红云,田俊峰,何欣枫,张建勋.网格计算中一种改进的启发式任务调度算法[J].计算机研究与发展,2006,43(z2):52-55. 被引量：1
6唐蕾,周兴社,王瀚博,王云岚.面向校园网格环境的多域监控系统[J].西北工业大学学报,2009,27(4):455-461. 被引量：1
7黄飞贇,方涛,田明杰.空间信息网格下监测系统体系的研究[J].计算机应用,2004,24(8):110-112. 被引量：4
8方娟,张书杰,邸瑞华,黄河.基于移动Agent的网格资源监控模型的研究[J].计算机应用研究,2004,21(11):62-64. 被引量：5
9褚瑞,肖侬,卢锡城.一种基于GMA结构的开放式网格资源信息服务[J].计算机研究与发展,2004,41(12):2114-2122. 被引量：4
10陶慕柳,吴产乐,邢建兵,张沪寅,吴黎兵.基于性能监控的网格应用自适应调节机制[J].计算机研究与发展,2004,41(12):2175-2180. 被引量：4

同被引文献53

1邝坪,金海,袁平鹏,陈汉华.基于OGSA的网格服务容错框架的研究和应用[J].华中科技大学学报（自然科学版）,2005,33(z1):25-28. 被引量：2
2杜文超,王国宏,李振宇,张志杰.一种估算调频信号瞬时频率的有效方法[J].国外电子测量技术,2005,24(5):12-16. 被引量：2
3张伟哲,刘欣然,云晓春,张宏莉,胡铭曾,刘凯鹏.信任驱动的网格作业调度算法[J].通信学报,2006,27(2):73-79. 被引量：33
4石宣化,金海,羌卫中.通用网格容错框架研究[J].华中科技大学学报（自然科学版）,2006,34(7):42-45. 被引量：4
5Huang S, Kesselman C. A flexible framework for fault tolerance in the grid[J].Journal of Grid Computing,2003,1 (3) :251-272. 被引量：1
6Azzed IN F, Maheswaran M. Integrating trust into grid resource management systems[ A]. Proc of International Conference on Parallel Processing[ C]. LosAlamitos: IEEE Computer Society Press, 2002,47 -54. 被引量：1
7Resn ICK P, Zeckhauser R, FR Iedman E, et al. Reputation systems[ J]. Communications of the ACM, 2000, 43(12) :45-48. 被引量：1
8Hwang S, Kesselman C. Grid workflow: a flexible framework for fault tolerance in the grid[ D]. Ph. D Dissertation of Southern California University, 2003 ( 8 ) : 88-98. 被引量：1
9HWANG S.Grid workflow:a flexible framework for fault tolerance in the grid[D].[S.1.] :Southern California University,2003. 被引量：1
10REZA A M.From fourier transform to wavelet transform[J].White Paper,1999,10.27. 被引量：1

引证文献7

1李涛,谷建华,李慧.基于CORBA的应用级容错系统的设计与实现[J].微电子学与计算机,2006,23(3):122-125. 被引量：2
2郭夙昌,杨波,黄洪钟.考虑节点失效恢复能力的网格服务可靠性建模与分析[J].西安交通大学学报,2008,42(6):693-697. 被引量：7
3姬晓波,陈蜀宇,田东,王荣斌.网格动态容错服务架构研究[J].计算机应用研究,2008,25(8):2534-2536.
4刘波,林伟伟,齐德昱.一种冗余调度的可靠网格计算模型[J].小型微型计算机系统,2010,31(3):515-518. 被引量：1
5郭少琨,罗建,李毅,马宇锋.瞬时频率估计方法对比研究[J].国外电子测量技术,2010,29(6):21-25. 被引量：10
6戴志辉,肖海力,曹荣强,迟学斌,曹宗雁.三层架构超级计算环境容错框架[J].计算机应用研究,2011,28(7):2576-2579. 被引量：4
7雷正桥,伍文棣,郭凯旋,刘珊.面向网格计算的动态容错服务框架设计（英文）[J].机床与液压,2016,44(24):138-145.

二级引证文献24

1张建生,吴健.基于CORBA的容错事件服务的研究与实现[J].微电子学与计算机,2007,24(6):166-169.
2郭夙昌,黄洪钟.基于蒙特卡洛仿真的制造资源网格任务可靠性分析[J].中国机械工程,2009(24):2938-2942.
3郭夙昌,黄洪钟,许焕卫,万虎,谢旻.失效恢复机制下的网格任务冗余调度优化[J].机械工程学报,2010,46(23):154-160. 被引量：1
4付永杰,侯兴勃.时间检定仪自动检定系统设计[J].电子测量技术,2011,34(1):39-41. 被引量：2
5邸珩烨,周勇,闫源江.一种强干扰背景下雷达目标检测算法[J].舰船电子工程,2011,31(5):67-69. 被引量：2
6邓异,梁燕,凌继平.基于分数阶傅里叶变换的线性调频信号检测算法[J].舰船电子工程,2011,31(9):43-46. 被引量：3
7王旭东,潘明海,蒋晓红,梁志国.雷达目标模拟器LFM信号线性度校准研究[J].仪器仪表学报,2011,32(10):2388-2392. 被引量：3
8顾军,罗军舟,曹玖新,李伟.基于排队Petri网的服务系统性能建模与分析方法[J].计算机学报,2011,34(12):2435-2455. 被引量：18
9赵丽,孙永,马彦臻,何洋.基于SSVEP的脑-机接口自动车系统研究[J].电子测量技术,2011,34(12):70-72. 被引量：4
10赵晓昳,董小社,田红波,曾灵萍,刘菲菲.网状结构网格服务可靠性模型研究[J].高技术通讯,2012,22(1):14-19.

1田永涛.基于批处理命令的Sql server数据库备份和恢复策略的研究[J].广西电业,2010(8):85-86. 被引量：2
2萧木.利用“应用程序恢复”来挽救停止响应的Office文档[J].计算机应用文摘,2002(6):119-119.
3稳定应用可靠计算浪潮商用电脑英政系列[J].电脑采购,2003,0(36):15-15.
4邓立苗,熊凯,史新梁.交通管理系统中Oracle数据库的备份和恢复[J].电脑开发与应用,2008,21(10):77-78.
5陈述,李清都,胡诗沂.动力系统可靠计算研究综述[J].计算机应用,2010,30(12):223-226.
6琚小明,张皆浩,张逸中.基于FPGA实时错误检测技术[J].计算机应用,2013,33(5):1459-1462.
7昌月楼,阳国贵.无共享并行数据库中结点故障对策[J].计算机科学,1995,22(5):42-45. 被引量：2
8叶建伟,方滨兴,田志宏,张宏莉.基于节点相似度的容错网格作业调度算法研究[J].高技术通讯,2008,18(12):1224-1230. 被引量：2
9李汶隆,杨浩钦,李录明.基于脆弱水印的错误检测技术在H.264中的应用[J].电子设计应用,2006,0(11):98-100. 被引量：1
10谢延敏,于沪平,陈军,阮雪榆.基于Kriging模型的可靠度计算[J].中国学术期刊文摘,2007,13(14):94-94. 被引量：2

微电子学与计算机

2005年第7期

浏览历史

内容加载中请稍等...

实现可靠计算的容错网格结构被引量：7

参考文献13

二级参考文献9

共引文献48

同被引文献53

引证文献7

二级引证文献24

相关作者

相关机构

相关主题

浏览历史

实现可靠计算的容错网格结构 被引量：7

参考文献13

二级参考文献9

共引文献48

同被引文献53

引证文献7

二级引证文献24

相关作者

相关机构

相关主题

浏览历史

实现可靠计算的容错网格结构被引量：7