摘要
大量的大规模密集型数据需要存储在多个数据存储中心,而应用越来越广泛的云计算环境很好地解决了大规模密集型数据在分配中遇到的规模性问题。但是,云计算环境中多数据存储中心的数据分配会带来数据存储中心之间数据量的传输,从而导致数据访问效率低下。同时,单位时间上数据访问量的不平衡性会引起数据存储中心的访问瓶颈。以大规模密集型数据中的数据流为建模对象,提出了一种数据分配算法,它在保证数据存储中心负载平衡的基础上兼顾了密集型数据之间的依赖性。实验表明,相比于同类的数据分配算法,所提算法具有更好的综合表现,特别是在保证数据存储中心的负载平衡方面,效果突出。
A huge number of large-scale intensive data have to be stored in distributed data centers.Nowadays,under the cloud environment,large-scale data storage can be better supported.However,a challenging issue is that the transmission of intensive data between cloud data centers may cause low efficiency of data access.Also,the bottleneck of access on data center may be derived from the imbalanced capacity of data visit in unit interval.We first proposed a model based on data flow between large-scale intensive data.Afterwards,a data allocation algorithm was presented to guarantee the load balance of data centers while considering dependencies between intensive data.Extensive experiments confirm that our solution has better performances than conventional approaches particularly in load balance.
出处
《计算机科学》
CSCD
北大核心
2012年第5期141-146,171,共7页
Computer Science
基金
中科院知识创新项目(KGCX2-YW-174)资助
关键词
数据分配
云计算
大规模密集型数据
负载平衡
数据依赖
Data allocation
Cloud computing
Large-scale intensive data
Load balance
Data dependency