摘要
数据去重可删除备份过程中的冗余数据,节约存储资源与与网络带宽,是目前数据存储领域研究的重点问题。针对传统方法去重率和吞吐量低的问题,提出一种新的大型文本数据库中分布式数据去重备份方法。介绍了大型文本数据库中分布式数据去重方法基本思想,通过文件管理部分对数据片组中的扶贫数据进行查询和统计,给出数据片组路由策略,分析了数据预取过程。通过最终权值完成对数据块的排列,以恢复时间与恢复成本达到最小化为目的设计适应度函数。通过改进遗传方法对大型文本数据库中分布式扶贫数据进行备份。实验结果表明,所提方法有很高的去重率和吞吐量,且成本低,恢复速度快。
Data deduplication can remove redundant data in the backup process and save storage resources and network bandwidth. It is the key point of data storage research. Aiming at the problem of low throughput and low throughput of traditional methods,a new method of distributed data deduplication in large text databases is proposed. Distributed data of large text databases was introduced to the basic thought way through the file management part query and statistical data on poverty alleviation sheet group,and given the data group routing strategy,analyzed the data prefetching process. Finally,the fitness function is designed by minimizing the recovery time and the recovery cost by arranging the data blocks with the final weights. Genetic algorithm is used to backup the distributed poverty reduction data in large text databases. Experimental results show that the proposed method has high throughput and low throughput,and it has low cost and fast recovery speed.
出处
《科学技术与工程》
北大核心
2018年第4期310-315,共6页
Science Technology and Engineering
基金
中央高校基本科研业务费专项资金(XDJK2014C110)
贵州省科学技术基金(黔科合LH字[2014]7538号)资助
关键词
文本数据库
分布式数据
去重
备份
text database
distributed data
de-duplicate
backup