In the current era of cloud computing, data stored in the cloud is being generated at a tremendous speed, and thus the cloud storage system has become one of the key components in cloud computing. By storing a substan...In the current era of cloud computing, data stored in the cloud is being generated at a tremendous speed, and thus the cloud storage system has become one of the key components in cloud computing. By storing a substantial amount of data in commodity disks inside the data center that hosts the cloud, the cloud storage system must consider one question very carefully: how do we store data reliably with a high efficiency in terms of both storage overhead and data integrity? Though it is easy to store replicated data to tolerate a certain amount of data losses, it suffers from a very low storage efficiency. Conventional erasure coding techniques, such as Reed-Solomon codes, are able to achieve a much lower storage cost with the same level of tolerance against disk failures. However, it incurs much higher repair costs, not to mention an even higher access latency. In this sense, designing new coding techniques for cloud storage systems has gained a significant amount of attention in both academia and the industry. In this paper, we examine the existing results of coding techniques for cloud storage systems. Specifically, we present these coding techniques into two categories: regenerating codes and locally repairable codes. These two kinds of codes meet the requirements of cloud storage along two different axes: optimizing bandwidth and I/O overhead. We present an overview of recent advances in these two categories of coding techniques. Moreover, we introduce the main ideas of some specific coding techniques at a high level, and discuss their motivations and performance.展开更多
日益旺盛的跨云存算联调需求对跨云数据访问速度提出较高要求.因此,跨云数据访问速度较高的基于数据冗余技术(纠删码和多副本)的跨云数据访问方法逐渐受到关注.其中,基于纠删码的跨云数据访问方法因其存储开销较低、容错性较高而成为当...日益旺盛的跨云存算联调需求对跨云数据访问速度提出较高要求.因此,跨云数据访问速度较高的基于数据冗余技术(纠删码和多副本)的跨云数据访问方法逐渐受到关注.其中,基于纠删码的跨云数据访问方法因其存储开销较低、容错性较高而成为当前研究热点.为通过缩短编码块传输用时以提高数据访问速度,现有基于纠删码的跨云数据访问方法尝试引入缓存技术并优化编码数据访问方案.然而,由于现有方法的缓存管理粒度较粗且未协同优化缓存管理与编码数据访问方案,导致其存在缓存命中量低、缓存命中增效低、低传输速度编码块访问量大等问题,使得其编码块传输用时仍较长.为此,首先提出了一种基于星际文件系统(interplanetary file system,IPFS)的跨云存储系统框架(IPFS-based cross-cloud storage system framework,IBCS),可基于IPFS数据分片管理机制实现细粒度的缓存管理,从而可提高缓存命中量.然后,提出一种面向存算联调的跨云纠删码自适应数据访问方法(adaptive erasure-coded data access method for cross-cloud collaborative scheduling of storage and computation,AECAM).AECAM以编码块(含缓存编码块)与数据访问节点的分布为依据评估数据访问过程中各编码块的传输速度,并据此制定可避免访问低传输速度编码块的编码数据访问方案.此外,AECAM可识别出其制定编码数据访问方案时易选中且实际传输速度较低的编码块,并将其缓存在数据访问节点附近,从而可同时提高缓存命中量和命中增效.最后,基于IBCS和AECAM构建了面向跨云存算联调的存储系统(cross-cloud storage system for collaborative scheduling of storage and computation,C2S2).跨云环境下的实验表明,相较于现有引入缓存的基于纠删码的存储系统,C2S2可以将数据访问速度提高75.22%~81.29%.展开更多
分布式存储系统作为数据存储的载体,广泛应用于大数据领域.纠删码存储方式相对副本方式,既具有较高的空间利用效率,又能保证数据存储的可靠性,因此被越来多的应用于存储系统当中.在EB级大规模纠删码分布式存储系统中,元数据管理成本较大...分布式存储系统作为数据存储的载体,广泛应用于大数据领域.纠删码存储方式相对副本方式,既具有较高的空间利用效率,又能保证数据存储的可靠性,因此被越来多的应用于存储系统当中.在EB级大规模纠删码分布式存储系统中,元数据管理成本较大,位置信息等元数据查询效率影响了I O时延和吞吐量.基于位置信息记录的有中心数据放置算法需要频繁访问元数据服务器,导致性能优化受限,基于Hash映射的无中心数据放置算法越来越多地得到应用.但面向纠删码的无中心放置算法,在节点变更和数据恢复过程中,存在位置变更困难、迁移数据量大、数据恢复和迁移并发度低等问题.提出了一种基于条带的一致性Hash数据放置算法(consistent Hash data placement algorithm based on stripe,SCHash),SCHash以条带为单位放置数据,通过把数据块到节点的映射转化为条带到节点组的映射过程,减少节点变动过程中的数据迁移量,从而在恢复过程中降低了变动数据的比例,加速了恢复带宽.并基于SCHash算法设计了一种基于条带的并发I O调度恢复策略,通过避免选取同一节点的数据块进行I O操作,提升了I O并行度,通过调度恢复I O和迁移I O的执行顺序,减少了数据恢复的执行时间.相比APHash数据放置算法,SCHash在数据恢复过程中,减少了46.71%~85.28%数据的迁移.在条带内重建时,恢复带宽提升了48.16%,在条带外节点重建时,恢复带宽提升了138.44%.展开更多
VoIP(voice over IP)是基于UDP/IP协议族的语音通信技术,当信道环境变差时不可避免地会产生网络分组丢失,这给建立在其上的VoIP隐写的可靠传输带来了挑战。提出利用纠删码对秘密信息进行冗余预处理,再结合矩阵嵌入编码实现最小失真的隐...VoIP(voice over IP)是基于UDP/IP协议族的语音通信技术,当信道环境变差时不可避免地会产生网络分组丢失,这给建立在其上的VoIP隐写的可靠传输带来了挑战。提出利用纠删码对秘密信息进行冗余预处理,再结合矩阵嵌入编码实现最小失真的隐写,从而建立了基于联合编码的嵌入和提取模型。在此基础上,分析了关键参数对联合编码性能的影响并给出了最优参数的选取算法。实验结果表明,所提联合编码能够有效提高隐写系统的抗分组丢失能力,且能减少对语音流的修改。展开更多
文摘In the current era of cloud computing, data stored in the cloud is being generated at a tremendous speed, and thus the cloud storage system has become one of the key components in cloud computing. By storing a substantial amount of data in commodity disks inside the data center that hosts the cloud, the cloud storage system must consider one question very carefully: how do we store data reliably with a high efficiency in terms of both storage overhead and data integrity? Though it is easy to store replicated data to tolerate a certain amount of data losses, it suffers from a very low storage efficiency. Conventional erasure coding techniques, such as Reed-Solomon codes, are able to achieve a much lower storage cost with the same level of tolerance against disk failures. However, it incurs much higher repair costs, not to mention an even higher access latency. In this sense, designing new coding techniques for cloud storage systems has gained a significant amount of attention in both academia and the industry. In this paper, we examine the existing results of coding techniques for cloud storage systems. Specifically, we present these coding techniques into two categories: regenerating codes and locally repairable codes. These two kinds of codes meet the requirements of cloud storage along two different axes: optimizing bandwidth and I/O overhead. We present an overview of recent advances in these two categories of coding techniques. Moreover, we introduce the main ideas of some specific coding techniques at a high level, and discuss their motivations and performance.
文摘日益旺盛的跨云存算联调需求对跨云数据访问速度提出较高要求.因此,跨云数据访问速度较高的基于数据冗余技术(纠删码和多副本)的跨云数据访问方法逐渐受到关注.其中,基于纠删码的跨云数据访问方法因其存储开销较低、容错性较高而成为当前研究热点.为通过缩短编码块传输用时以提高数据访问速度,现有基于纠删码的跨云数据访问方法尝试引入缓存技术并优化编码数据访问方案.然而,由于现有方法的缓存管理粒度较粗且未协同优化缓存管理与编码数据访问方案,导致其存在缓存命中量低、缓存命中增效低、低传输速度编码块访问量大等问题,使得其编码块传输用时仍较长.为此,首先提出了一种基于星际文件系统(interplanetary file system,IPFS)的跨云存储系统框架(IPFS-based cross-cloud storage system framework,IBCS),可基于IPFS数据分片管理机制实现细粒度的缓存管理,从而可提高缓存命中量.然后,提出一种面向存算联调的跨云纠删码自适应数据访问方法(adaptive erasure-coded data access method for cross-cloud collaborative scheduling of storage and computation,AECAM).AECAM以编码块(含缓存编码块)与数据访问节点的分布为依据评估数据访问过程中各编码块的传输速度,并据此制定可避免访问低传输速度编码块的编码数据访问方案.此外,AECAM可识别出其制定编码数据访问方案时易选中且实际传输速度较低的编码块,并将其缓存在数据访问节点附近,从而可同时提高缓存命中量和命中增效.最后,基于IBCS和AECAM构建了面向跨云存算联调的存储系统(cross-cloud storage system for collaborative scheduling of storage and computation,C2S2).跨云环境下的实验表明,相较于现有引入缓存的基于纠删码的存储系统,C2S2可以将数据访问速度提高75.22%~81.29%.
文摘分布式存储系统作为数据存储的载体,广泛应用于大数据领域.纠删码存储方式相对副本方式,既具有较高的空间利用效率,又能保证数据存储的可靠性,因此被越来多的应用于存储系统当中.在EB级大规模纠删码分布式存储系统中,元数据管理成本较大,位置信息等元数据查询效率影响了I O时延和吞吐量.基于位置信息记录的有中心数据放置算法需要频繁访问元数据服务器,导致性能优化受限,基于Hash映射的无中心数据放置算法越来越多地得到应用.但面向纠删码的无中心放置算法,在节点变更和数据恢复过程中,存在位置变更困难、迁移数据量大、数据恢复和迁移并发度低等问题.提出了一种基于条带的一致性Hash数据放置算法(consistent Hash data placement algorithm based on stripe,SCHash),SCHash以条带为单位放置数据,通过把数据块到节点的映射转化为条带到节点组的映射过程,减少节点变动过程中的数据迁移量,从而在恢复过程中降低了变动数据的比例,加速了恢复带宽.并基于SCHash算法设计了一种基于条带的并发I O调度恢复策略,通过避免选取同一节点的数据块进行I O操作,提升了I O并行度,通过调度恢复I O和迁移I O的执行顺序,减少了数据恢复的执行时间.相比APHash数据放置算法,SCHash在数据恢复过程中,减少了46.71%~85.28%数据的迁移.在条带内重建时,恢复带宽提升了48.16%,在条带外节点重建时,恢复带宽提升了138.44%.
文摘VoIP(voice over IP)是基于UDP/IP协议族的语音通信技术,当信道环境变差时不可避免地会产生网络分组丢失,这给建立在其上的VoIP隐写的可靠传输带来了挑战。提出利用纠删码对秘密信息进行冗余预处理,再结合矩阵嵌入编码实现最小失真的隐写,从而建立了基于联合编码的嵌入和提取模型。在此基础上,分析了关键参数对联合编码性能的影响并给出了最优参数的选取算法。实验结果表明,所提联合编码能够有效提高隐写系统的抗分组丢失能力,且能减少对语音流的修改。