In the current era of cloud computing, data stored in the cloud is being generated at a tremendous speed, and thus the cloud storage system has become one of the key components in cloud computing. By storing a substan...In the current era of cloud computing, data stored in the cloud is being generated at a tremendous speed, and thus the cloud storage system has become one of the key components in cloud computing. By storing a substantial amount of data in commodity disks inside the data center that hosts the cloud, the cloud storage system must consider one question very carefully: how do we store data reliably with a high efficiency in terms of both storage overhead and data integrity? Though it is easy to store replicated data to tolerate a certain amount of data losses, it suffers from a very low storage efficiency. Conventional erasure coding techniques, such as Reed-Solomon codes, are able to achieve a much lower storage cost with the same level of tolerance against disk failures. However, it incurs much higher repair costs, not to mention an even higher access latency. In this sense, designing new coding techniques for cloud storage systems has gained a significant amount of attention in both academia and the industry. In this paper, we examine the existing results of coding techniques for cloud storage systems. Specifically, we present these coding techniques into two categories: regenerating codes and locally repairable codes. These two kinds of codes meet the requirements of cloud storage along two different axes: optimizing bandwidth and I/O overhead. We present an overview of recent advances in these two categories of coding techniques. Moreover, we introduce the main ideas of some specific coding techniques at a high level, and discuss their motivations and performance.展开更多
Network coding (NC), introduced at the turn of the century, enables nodes in a network to combine data algebraically before either sending or forwarding them. Random network coding has gained popularity over the years...Network coding (NC), introduced at the turn of the century, enables nodes in a network to combine data algebraically before either sending or forwarding them. Random network coding has gained popularity over the years by combining the received packet randomly before forwarding them, resulting in a complex Jordan Gaussian Elimination (JGE) decoding process. The effectiveness of random NC is through cooperation among nodes. In this paper, we propose a simple, low-complexity cooperative protocol that exploits NC in a deterministic manner resulting in improved diversity, data rate, and less complex JGE decoding process. The proposed system is applied over a lossy wireless network. The scenario under investigation is as follows: M users must send their information to a common destination D and to exchange the information between each others, over erasure channels;typically the channels between the users and the destination are worse than the channels between users. It is possible to significantly reduce the traffic among users and destination, achieving significant bandwidth savings, by combining packets from different users in simple, deterministic ways without resorting to extensive header information before being forwarded to the destination and the M users. The key problem we try to address is how to efficiently combine the packets at each user while exploiting user cooperation and the probability of successfully recovering information from all users at D with k < 2M unique linear equations, accounting for the fact that the remaining packets will be lost in the network and there are two transmission stages. Simulation results show the behaviour for two and three transmission stages. Our results show that applying NC protocols in two or three stages decreases the traffic significantly, beside the fact that the proposed protocols enable the system to retrieve the lost packets rather than asking for ARQ, resulting in improved data flow, and less power consumption. In fact, in some protocols the ARQ dropped from the r展开更多
The K-means method is a well-known clustering algorithm with an extensive range of applications,such as biological classification,disease analysis,data mining,and image compression.However,the plain K-means method is ...The K-means method is a well-known clustering algorithm with an extensive range of applications,such as biological classification,disease analysis,data mining,and image compression.However,the plain K-means method is not fast when the number of clusters or the number of data points becomes large.A modified K-means algorithm was presented by Fahim et al.(2006).The modified algorithm produced clusters whose mean square error was very similar to that of the plain K-means,but the execution time was shorter.In this study,we try to further increase its speed.There are two rules in our method:a selection rule,used to acquire a good candidate as the initial center to be checked,and an erasure rule,used to delete one or many unqualified centers each time a specified condition is satisfied.Our clustering results are identical to those of Fahim et al.(2006).However,our method further cuts computation time when the number of clusters increases.The mathematical reasoning used in our design is included.展开更多
To reduce the time required to complete the regeneration process of erasure codes, we propose a Tree-structured Parallel Regeneration (TPR) scheme for multiple data losses in distributed storage systems. Under the sch...To reduce the time required to complete the regeneration process of erasure codes, we propose a Tree-structured Parallel Regeneration (TPR) scheme for multiple data losses in distributed storage systems. Under the scheme, two algorithms are proposed for the construction of multiple regeneration trees, namely the edge-disjoint algorithm and edge-sharing algorithm. The edge-disjoint algorithm constructs multiple independent trees, and is simple and appropriate for environments where newcomers and their providers are distributed over a large area and have few intersections. The edge-sharing algorithm constructs multiple trees that compete to utilize the bandwidth, and make a better utilization of the bandwidth, although it needs to measure the available band-width and deal with the bandwidth changes; it is therefore difficult to implement in practical systems. The parallel regeneration for multiple data losses of TPR primarily includes two optimizations: firstly, transferring the data through the bandwidth optimized-paths in a pipe-line manner; secondly, executing data regeneration over multiple trees in parallel. To evaluate the proposal, we implement an event-based simulator and make a detailed comparison with some popular regeneration methods. The quantitative comparison results show that the use of TPR employing either the edge-disjoint algorithm or edge-sharing algorithm reduces the regeneration time significantly.展开更多
A novel channel estimation algorithm is presented in this paper for the recently proposed cyclic postfix based on orthogonal frequency division multiplexing (OFDM) systems. Phase equalization with the erasure decisi...A novel channel estimation algorithm is presented in this paper for the recently proposed cyclic postfix based on orthogonal frequency division multiplexing (OFDM) systems. Phase equalization with the erasure decision is used to reduce both the channel estimation error and the computational complexity. Simulation results show that the proposed channel estimation algorithm can effectively estimate the channel impulse response (CIR) and the performance of the proposed phase equalization with erasure decision is comparable with the minimal mean square error (MMSE) equalization, but it offers less computational complexity.展开更多
In the process of encoding and decoding,erasure codes over binary fields,which just need AND operations and XOR operations and therefore have a high computational efficiency,are widely used in various fields of inform...In the process of encoding and decoding,erasure codes over binary fields,which just need AND operations and XOR operations and therefore have a high computational efficiency,are widely used in various fields of information technology.A matrix decoding method is proposed in this paper.The method is a universal data reconstruction scheme for erasure codes over binary fields.Besides a pre-judgment that whether errors can be recovered,the method can rebuild sectors of loss data on a fault-tolerant storage system constructed by erasure codes for disk errors.Data reconstruction process of the new method has simple and clear steps,so it is beneficial for implementation of computer codes.And more,it can be applied to other non-binary fields easily,so it is expected that the method has an extensive application in the future.展开更多
Fault-tolerance is increasingly significant for large-scale storage systems in which Byzantine failure of storage nodes may happen. Traditional Byzantine Quorum systems that tolerate Byzantine failures by using replic...Fault-tolerance is increasingly significant for large-scale storage systems in which Byzantine failure of storage nodes may happen. Traditional Byzantine Quorum systems that tolerate Byzantine failures by using replication have two main limitations: low space-efficiency and static quorum variables. We propose an Erasure-code Byzantine Fault-tolerance Quorum that can provide high reliability with far lower storage overhead than replication by adopting erasure code as redundancy scheme. Through read/write operations of clients and diagnose operation of supervisor, our Quorum system can detect Byzantine nodes, and dynamically adjust system size and fault threshold. Simulation results show that our method improves performance for the Quorum with relatively small quorums.展开更多
Write/erase degradation after endurance cycling due to electron trapping events in triple-gate flash memory have been detected and analyzed using a UV erasure method. Different from the commonly degradation phenomenon...Write/erase degradation after endurance cycling due to electron trapping events in triple-gate flash memory have been detected and analyzed using a UV erasure method. Different from the commonly degradation phenomenon, write-induced electron trapping in the floating gate oxide, electron trapping in tunneling oxide is observed in triple-gate flash memory. Further, the degradation due to single-electron locally trapping/de-trapping in hornshaped SuperFlash does not occur in the triple-gate flash cell This is because of planar poly-to-poly erasing in the triple-gate flash cell instead of tip erasing in the horn-shaped SuperFlash cell Moreover, by TCAD simulation, the trap location is identified and the magnitude of its density is quantified roughly.展开更多
文摘In the current era of cloud computing, data stored in the cloud is being generated at a tremendous speed, and thus the cloud storage system has become one of the key components in cloud computing. By storing a substantial amount of data in commodity disks inside the data center that hosts the cloud, the cloud storage system must consider one question very carefully: how do we store data reliably with a high efficiency in terms of both storage overhead and data integrity? Though it is easy to store replicated data to tolerate a certain amount of data losses, it suffers from a very low storage efficiency. Conventional erasure coding techniques, such as Reed-Solomon codes, are able to achieve a much lower storage cost with the same level of tolerance against disk failures. However, it incurs much higher repair costs, not to mention an even higher access latency. In this sense, designing new coding techniques for cloud storage systems has gained a significant amount of attention in both academia and the industry. In this paper, we examine the existing results of coding techniques for cloud storage systems. Specifically, we present these coding techniques into two categories: regenerating codes and locally repairable codes. These two kinds of codes meet the requirements of cloud storage along two different axes: optimizing bandwidth and I/O overhead. We present an overview of recent advances in these two categories of coding techniques. Moreover, we introduce the main ideas of some specific coding techniques at a high level, and discuss their motivations and performance.
文摘Network coding (NC), introduced at the turn of the century, enables nodes in a network to combine data algebraically before either sending or forwarding them. Random network coding has gained popularity over the years by combining the received packet randomly before forwarding them, resulting in a complex Jordan Gaussian Elimination (JGE) decoding process. The effectiveness of random NC is through cooperation among nodes. In this paper, we propose a simple, low-complexity cooperative protocol that exploits NC in a deterministic manner resulting in improved diversity, data rate, and less complex JGE decoding process. The proposed system is applied over a lossy wireless network. The scenario under investigation is as follows: M users must send their information to a common destination D and to exchange the information between each others, over erasure channels;typically the channels between the users and the destination are worse than the channels between users. It is possible to significantly reduce the traffic among users and destination, achieving significant bandwidth savings, by combining packets from different users in simple, deterministic ways without resorting to extensive header information before being forwarded to the destination and the M users. The key problem we try to address is how to efficiently combine the packets at each user while exploiting user cooperation and the probability of successfully recovering information from all users at D with k < 2M unique linear equations, accounting for the fact that the remaining packets will be lost in the network and there are two transmission stages. Simulation results show the behaviour for two and three transmission stages. Our results show that applying NC protocols in two or three stages decreases the traffic significantly, beside the fact that the proposed protocols enable the system to retrieve the lost packets rather than asking for ARQ, resulting in improved data flow, and less power consumption. In fact, in some protocols the ARQ dropped from the r
基金Project (No. 100-2221-E-009-141-MY3) supported by the National Science Council
文摘The K-means method is a well-known clustering algorithm with an extensive range of applications,such as biological classification,disease analysis,data mining,and image compression.However,the plain K-means method is not fast when the number of clusters or the number of data points becomes large.A modified K-means algorithm was presented by Fahim et al.(2006).The modified algorithm produced clusters whose mean square error was very similar to that of the plain K-means,but the execution time was shorter.In this study,we try to further increase its speed.There are two rules in our method:a selection rule,used to acquire a good candidate as the initial center to be checked,and an erasure rule,used to delete one or many unqualified centers each time a specified condition is satisfied.Our clustering results are identical to those of Fahim et al.(2006).However,our method further cuts computation time when the number of clusters increases.The mathematical reasoning used in our design is included.
基金supported by the National Grand Fundamental Research of China (973 Program) under Grant No. 2011CB302601the National High Technology Research and Development of China (863 Program) under GrantNo. 2013AA01A213+2 种基金the National Natural Science Foundation of China under Grant No. 60873215the Natural Science Foundation for Distinguished Young Scholars of Hunan Province under Grant No. S2010J5050Specialized Research Fund for the Doctoral Program of Higher Education under Grant No. 20124307110015
文摘To reduce the time required to complete the regeneration process of erasure codes, we propose a Tree-structured Parallel Regeneration (TPR) scheme for multiple data losses in distributed storage systems. Under the scheme, two algorithms are proposed for the construction of multiple regeneration trees, namely the edge-disjoint algorithm and edge-sharing algorithm. The edge-disjoint algorithm constructs multiple independent trees, and is simple and appropriate for environments where newcomers and their providers are distributed over a large area and have few intersections. The edge-sharing algorithm constructs multiple trees that compete to utilize the bandwidth, and make a better utilization of the bandwidth, although it needs to measure the available band-width and deal with the bandwidth changes; it is therefore difficult to implement in practical systems. The parallel regeneration for multiple data losses of TPR primarily includes two optimizations: firstly, transferring the data through the bandwidth optimized-paths in a pipe-line manner; secondly, executing data regeneration over multiple trees in parallel. To evaluate the proposal, we implement an event-based simulator and make a detailed comparison with some popular regeneration methods. The quantitative comparison results show that the use of TPR employing either the edge-disjoint algorithm or edge-sharing algorithm reduces the regeneration time significantly.
基金the research grant of New Century Excellent Talent Support Project from Minister of EducationHigh-tech R&D Program of China (863 Program) with Grant No. 2007AA01Z2B6.
文摘A novel channel estimation algorithm is presented in this paper for the recently proposed cyclic postfix based on orthogonal frequency division multiplexing (OFDM) systems. Phase equalization with the erasure decision is used to reduce both the channel estimation error and the computational complexity. Simulation results show that the proposed channel estimation algorithm can effectively estimate the channel impulse response (CIR) and the performance of the proposed phase equalization with erasure decision is comparable with the minimal mean square error (MMSE) equalization, but it offers less computational complexity.
基金supported by the National Natural Science Foundation of China under Grant No.61501064Sichuan Provincial Science and Technology Project under Grant No.2016GZ0122
文摘In the process of encoding and decoding,erasure codes over binary fields,which just need AND operations and XOR operations and therefore have a high computational efficiency,are widely used in various fields of information technology.A matrix decoding method is proposed in this paper.The method is a universal data reconstruction scheme for erasure codes over binary fields.Besides a pre-judgment that whether errors can be recovered,the method can rebuild sectors of loss data on a fault-tolerant storage system constructed by erasure codes for disk errors.Data reconstruction process of the new method has simple and clear steps,so it is beneficial for implementation of computer codes.And more,it can be applied to other non-binary fields easily,so it is expected that the method has an extensive application in the future.
基金Supported by the National Natural Science Foun-dation of China (60373088)
文摘Fault-tolerance is increasingly significant for large-scale storage systems in which Byzantine failure of storage nodes may happen. Traditional Byzantine Quorum systems that tolerate Byzantine failures by using replication have two main limitations: low space-efficiency and static quorum variables. We propose an Erasure-code Byzantine Fault-tolerance Quorum that can provide high reliability with far lower storage overhead than replication by adopting erasure code as redundancy scheme. Through read/write operations of clients and diagnose operation of supervisor, our Quorum system can detect Byzantine nodes, and dynamically adjust system size and fault threshold. Simulation results show that our method improves performance for the Quorum with relatively small quorums.
文摘Write/erase degradation after endurance cycling due to electron trapping events in triple-gate flash memory have been detected and analyzed using a UV erasure method. Different from the commonly degradation phenomenon, write-induced electron trapping in the floating gate oxide, electron trapping in tunneling oxide is observed in triple-gate flash memory. Further, the degradation due to single-electron locally trapping/de-trapping in hornshaped SuperFlash does not occur in the triple-gate flash cell This is because of planar poly-to-poly erasing in the triple-gate flash cell instead of tip erasing in the horn-shaped SuperFlash cell Moreover, by TCAD simulation, the trap location is identified and the magnitude of its density is quantified roughly.