期刊文献+

面向用户的并行计算机系统可用性建模研究 被引量:4

Research on User-Oriented Availability Modeling in Parallel Computer Systems
下载PDF
导出
摘要 随着并行计算机系统规模的扩大,系统可用性面临很大的挑战,对大规模并行计算机系统可用性进行量化评估能为系统分析和设计提供有力的支持.根据任务和采用的容错策略,使用随机行为网建立了两个不同实例的并行计算机系统面向用户的可用性模型,模型在节点模块和网络模块基础上描述了任务执行的具体情况,并以执行中的有用工作比率作为可用度指标.最后结合实际数据进行了求解和分析.同一个系统下不同应用可能会反映给用户有较大差异的可用性特征,使用面向用户的并行计算机系统可用性模型可以较为精确地量化这种差异. The scale of parallel computer systems is even larger. The dependability of the system and the tasks face the great challenges in the situation. The availability include the reliability and serviceability, thereby it is the core specification of describing the correct service capabilities in a massively parallel computer system. The quantitative evaluation of availability of massively parallel computer system is significant for system analysis and design. The user-oriented availability models of parallel computer system which consider task characters and fault tolerance strategy are established by stochastic activity networks for two different examples in this paper: one is capability computing application with frequent communication among nodes, and the other is capacity computing application without communication. These models based on node module and networks module describe task running states and use useful work rate to measure the availability degree. The model includes the main factors that influence the availability of parallel computer system, which involve failure, hierarchical fault-tolerance, fault detect, application characteristics, repair strategy and faulty coverage ratio, etc. Then, the model is computed and analyzed with the actual data. The models can evaluate the user-oriented availability quantitatively, especially when the tasks are different and the parallel computer systems are the same.
出处 《计算机研究与发展》 EI CSCD 北大核心 2008年第5期886-894,共9页 Journal of Computer Research and Development
基金 国家"九七三"重点基础研究发展规划基金项目(2007CB310900)~~
关键词 可用性 量化模型 随机行为网 容错 面向用户 availability quantitative model stochastic activity networks fault tolerance user-oriented
  • 相关文献

参考文献19

  • 1Oliver C Ibe,Archana Sathaye,Richard C Howe,et al.Stochastic Petri net modeling of VAXcluster system availability[C].In:Proc of the third Int'l Workshop on Petri Nets and Performance Models.Los Alamitos,CA:IEEE Computer Society Press,1989.112-121 被引量:1
  • 2Chita R Das,Prasant Mohapatra,Lei Tien,et al.An availability model for MIN-based multiprocessors[J].IEEE Trans on Parallel and Distributed Systems,1993,4(10):1118-1129 被引量:1
  • 3C R Das,J T Kreulen,M J Thazhuthaveetil,et al.Dependability modeling for multiprocessors[J].IEEE Computer,1993,23(10):7-19 被引量:1
  • 4I H DAVID.Dependability modeling for computer systems[C].In:Proc of Annual Reliability and Maintainability Symposium.Los Alamitos,CAt IEEE Computer Society Press,1991.120-128 被引量:1
  • 5O Ibe,R Howe,K S Trivedi.Approximate availability analysis of VAXCluster systems[J].IEEE Trans on Reliability,1989,38(1):146-152 被引量:1
  • 6Hairong Sun,Jame J Han,Haim Levendel.A generic availability model for clustered computing systems[C].The 2001 Padfic Rim Int'l Symp on Dependable Computing,Seoul,Kores,2001 被引量:1
  • 7Sergiy A Vilkomir,David L Parnas,Veena B Mendiratta,et al.Availability evaluation of hardware/software systems with several recovery procedures[C].In:Proc of the 29th IEEE Annual Int'l Computer Software and Applications Conference.Los Alamitos,CA:IEEE Computer Society Press,2005.473-478 被引量:1
  • 8J Meyer,L Wei.Analysis of workload influence on dependability[C].In:Proc of the Symp on Fault-Tolerant Computing.Los Alamitos,CA:IEEE Computer Society Press,1988.84-89 被引量:1
  • 9Salim Hariri,Hasan Mutlu.Hierarchiesl modeling of availability in distributed systems[J].IEEE Trans on Software Engineering,1995,21(1):50-56 被引量:1
  • 10Yinong Chen,Zhongshi He.Task-oriented modeling of autonomous decentralized systems[C].2000 Int'l Workshop on Autonomous Decentralized Systems,Chengdu,2000 被引量:1

二级参考文献9

  • 1I Foster, C Kesselman. The Grid: Blueprint for a Future Computing Infrastructure. San Francisco, California: Morgan Kaufmann Publishers, 1999 被引量:1
  • 2K Czajkowski, I Foster, N Karonis, et al. A resource management architecture for metacomputing systems. IPPS/SPDP' 98 Workshop on Job Scheduling Strategies for Parallel Processing, Orlando, Florida, USA, 1998 被引量:1
  • 3Deqing Zou, Hai Jin, Hanhua Chen, et al. Fault-tolerant grid architecture and practice. Journal of Computer Science and Technology, 2003, 18(4): 423~433 被引量:1
  • 4K Geunmo, Y Hyunsoo. On submesh allocation for mesh multicomputers: A best fit allocation and a virtual submesh allocation for faulty meshes. IEEE Trans on Parallel and Distributed Systems, 1998, 9(2) : 175~ 185 被引量:1
  • 5G Allen, T Dramlitsch, I Foster, et al. Supporting efficient execution in heterogeneous distributed computing environments with cactus and globus. In: Supercomputing 2001. New York:ACM Press, 2001 被引量:1
  • 6林闯.计算机网络和计算机系统的性能评价.北京:清华大学出版社,2001(Lin Chuang. Performance Evaluation of Computer Networks and Computer Systems ( in Chinese ), Beijing: Tsinghua University Press, 2001 ) 被引量:1
  • 7G Ciardo, R Fricks, J K Muppala, et al. Manual for the SPNP Package 4.0. Durham, NC, USA: Duke University, 1994 被引量:1
  • 8张艳,孙世新,彭文钦.网格多处理机的一种改进的子网分配算法[J].软件学报,2001,12(8):1250-1257. 被引量:7
  • 9桂小林,钱德沛.基于Internet的网格计算模型研究[J].西安交通大学学报,2001,35(10):1008-1011. 被引量:34

共引文献13

同被引文献35

引证文献4

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部