期刊文献+

Lustre文件系统元数据服务恢复机制的改进 被引量:1

Improvement of Recovery Mechanism for Lustre Metadata Service
下载PDF
导出
摘要 Lustre的重启恢复算法需要集群中所有客户端在指定的恢复时间窗口内与服务器重新建立连接,客户端重传未提交的事务请求,服务器严格按照事务序列号重放所有未提交的事务,要求过于严格。针对Lustre可恢复性不强的缺点,提出了基于版本的恢复和共享时提交算法,它们分别对Lustre现有的元数据更新和恢复机制进行了改进和扩展,根据事务之间的依赖关系,允许客户端在更为宽松的条件下进行恢复并加入到集群而不被驱逐,提高了Lustre文件系统的可用性和可恢复性。最后通过一系列实验对改进后的算法的性能进行了评估。 Lustre reboot recovery algorithm needs that all clients reconnect to the server in a special recovery ttme win- dow, and then clients resend uncommitted transactional requests and the server replays these requests strictly in the transaction number order. The recovery conditions are too strict. To improve Lustre' s recoverability and availability, this paper proposed version based recovery and commit on share algorithms. They extend Lustre's metadata update al- gorithm and recovery algorithm respectively, and allow clients rejoin in the cluster by recovery under a more relaxed condition according to the dependence between transactions. At last, the performance of improved recovery algorithms was evaluated via a series of experiments.
出处 《计算机科学》 CSCD 北大核心 2015年第9期177-182,共6页 Computer Science
基金 国家973计划资助项目(2009CB723803) 国家自然科学基金资助项目(60873120)资助
关键词 LUSTRE 高性能计算 可恢复性 可用性 Lustre, HPC, Recoverability, Availability
  • 相关文献

参考文献13

  • 1Patterson D.Availability and Maintainability >> Performance:New Focus for a New Century [EB/OL].http://usenix.org/events/fast02/patterson/sld001.htm. 被引量:1
  • 2钱迎进..大规模Lustre集群文件系统关键技术的研究[D].国防科学技术大学,2011:
  • 3李晖..基于日志的机群文件系统高可用关键技术研究[D].中国科学院计算技术研究所,2005:
  • 4钱迎进,伊瑞海,肖侬,金士尧.Lustre文件系统元数据服务恢复机制研究[J].高性能计算技术,2013,0(6):10-16. 被引量:1
  • 5Bhide A,Elnozahy E N,Morgan S P.A Highly Available Network File Server [C]∥Proceedings of the Usenix Winter 1991 Conference.Dallas,TX,USA:USENIX Association,1991:199-205. 被引量:1
  • 6Devarakonda M,Kish B,Mohindra A.Recovery in the CalypsoFile System [J].ACM Transaction on Computer Systems,1996,14(3):287-310. 被引量:1
  • 7Mogul J C.Recovery in Spritely NFS [J].Computing Systems,the Journal of the USENIX Association,Spring,1994,7(2):201-262. 被引量:1
  • 8Baker M,Ousterhout J.Availability in the Sprite DistributedFile System [J].Operating Systems Review,1991,25(2):95-98. 被引量:1
  • 9Welch B,Baker M,Douglis F,et al.Sprite Position Statement:Use Distributed State for failure Recovery [C]∥Proceeding of the Second Workshop on Workstation Operating System.Pacific Grove,CA,USA:IEEE Computer Society,1989:130-133. 被引量:1
  • 10Baker M.Fast Crash Recovery in Distributed File Systems [D].California:University of California at Berkeley,1994:34-104. 被引量:1

二级参考文献32

  • 1姚念民,舒继武,郑纬民.SAN中的分布式锁机制[J].计算机研究与发展,2005,42(2):338-343. 被引量:1
  • 2Braam P J. Lustre: A Scable, High-Performance File Systme [M]. Lustre Whitepaper Version 1.0,2002. 被引量:1
  • 3Davis R G. VAXcluster Principles[M]. Digital Equipment Corporation, 1993. 被引量:1
  • 4Thomas K. Programming Locking Applications v. 4. 3. 1 [EB/OL]. [1999-05-08]. http: ffwww. 124. ibm. com/developerworks/oss/dlm/currentbook/dlmbook_index, html. 被引量:1
  • 5Kistler J J, Satyanarayanan M. Disconnected Operation in the Coda File System[J]. ACM Trans on Computer Systems, 1992,10(1):3-25. 被引量:1
  • 6Burns R C, Rees R M, Long D D E. Semi-Preemptible Locks for a Distributed File System[C]// Proc of 2000 Int'l Performance Computing and Communication Conf, 2000. 被引量:1
  • 7Schmuck F, Haskin R. GPFS: A Shared-Disk File System for Large Computing Cluters[C]//Proc of Conf on File and Storage Technologies, 2002. 被引量:1
  • 8Lustre file system:High-performance storage architecture and scalable cluster file system white paper[EB/OL].[2010-07-25].http: //www.sun.com/offers/details/LustreFileSystem.html. 被引量:1
  • 9CIFS Oplock file locking[EB/OL].[2010-07-25].http://msdn.microsoft.com/en-us/library/dd327670.aspx. 被引量:1
  • 10Devarakonda M, Kish B, Mohindra A.Recovery in the Calypso file system[J].ACM Transactions on Computer Systems, 1996, 14 (3):287-310. 被引量:1

共引文献11

同被引文献9

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部