期刊文献+

Fatman: Building Reliable Archival Storage Based on Low-Cost Volunteer Resources

Fatman: Building Reliable Archival Storage Based on Low-Cost Volunteer Resources
原文传递
导出
摘要 We present Fatman, an enterprise-scale archival storage based on volunteer contribution resources from underutilized web servers, usually deployed on thousands of nodes with spare storage capacity. Fatman is specifically designed for enhancing the utilization of existing storage resources and cutting down the hardware purchase cost. Two major concerned issues of the system design are maximizing the resource utilization of volunteer nodes without violating service level objectives (SLOs) and minimizing the cost without reducing the availability of archival system. Fatman has been widely deployed on tens of thousands of server nodes across several datacenters, providing more than 100 PB storage capacity and serving dozens of internal mass-data applications. The system realizes an efficient storage quota consolidation by strong isolation and budget limitation, to maximally support resource contribution without any degradation on host-level SLOs. It novelly improves data reliability by applying disk failure prediction to minish failure recovery cost, named fault-aware data management, dramatically reduces the mean time to repair (MTTR) by 76.3% and decreases file crash ratio by 35% on real-life product workload. We present Fatman, an enterprise-scale archival storage based on volunteer contribution resources from underutilized web servers, usually deployed on thousands of nodes with spare storage capacity. Fatman is specifically designed for enhancing the utilization of existing storage resources and cutting down the hardware purchase cost. Two major concerned issues of the system design are maximizing the resource utilization of volunteer nodes without violating service level objectives (SLOs) and minimizing the cost without reducing the availability of archival system. Fatman has been widely deployed on tens of thousands of server nodes across several datacenters, providing more than 100 PB storage capacity and serving dozens of internal mass-data applications. The system realizes an efficient storage quota consolidation by strong isolation and budget limitation, to maximally support resource contribution without any degradation on host-level SLOs. It novelly improves data reliability by applying disk failure prediction to minish failure recovery cost, named fault-aware data management, dramatically reduces the mean time to repair (MTTR) by 76.3% and decreases file crash ratio by 35% on real-life product workload.
机构地区 Baidu Inc.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2015年第2期273-282,共10页 计算机科学技术学报(英文版)
关键词 volunteer storage failure prediction failure recovery reliability archival storage volunteer storage failure prediction, failure recovery, reliability, archival storage
  • 相关文献

参考文献29

  • 1Sathiamoorthy M, Asteris M, Papailiopoulos D S, Dimakis A G, Vadali R, Chen S, Borthakur D. XORing elephants:Novel erasure codes for big data. In Proe. the 39th VLDB, Aug. 2013, pp.325-336. 被引量:1
  • 2Huang C, Simitci H, Xu Y, Ogus A, Calder B, Gopalan P, Li J, Yekhanin S. Erasure coding in windows Azure storage. In Proc. USENIX ATC, Jun. 2012. 被引量:1
  • 3Vrable M, Savage S, Voelker G M. Cumulus: Filesystem backup to the chmd. In Proc. the 7th USENIX Conf. File and Storage Technologies, Feb. 2009, pp.225-238. 被引量:1
  • 4Vrable M, Savage S, Voelker G M. BlueSky: A cloud-backed file system for the enterprise. In Proe. the 10th USENIX Conf. File and Stor'agc Technologies, Feb. 2012, pp.19:119:14. 被引量:1
  • 5Reed I S, Solomon G. Polynomial codes over certain fi- nite fields. Jouval of the Society .for IndustTgal and Applied Mathematics, 1960, 8(2): 300-304. 被引量:1
  • 6Khan O, Burns A, Plank J, Pierce W, Huang C. Rethinking erasure codes for cloud file systems: Minimizing I/O for re- covery and degraded reads. In Proc. the 10th USENIX Conf. File and Stornge Technologies, Feb. 2012, pp.20:120:14. 被引量:1
  • 7Cipar J, Corner M D, Berger E D. TFS: A transparent file system for contributory storage. In Proc. the 5th USENIX Conf. File and Storage Technologies, Feb. 2007, pp.215-229. 被引量:1
  • 8McKusick M K, Joy W N, Leffler S J, Fabry R S. A fast file system for UNIX. ACM Trans. Co,input. Syst., 1984, 2(3): 181-197. 被引量:1
  • 9Hoelzle U, Barroso L A. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines (lst edition). Morgan and Claypool Publishers, 2009. 被引量:1
  • 10Sehroeder B, Gibson G A. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proc. the 5th USENIX Conf. File and Stora9e Technologies, Fet). 2007, pp.1:1 1:16. 被引量:1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部