摘要
为提高重复数据删除算法的重删率,减少CDC算法边界硬分块,使重复数据删除率和性能之间得到平衡,提出了指纹极值的双层重复数据删除算法(DDFE).首先在第一层重复数据删除模型中使用较大的分块大小,保证重删操作的速度;然后将第一层模型重删后的不重复数据输入到分块大小较小的第二层重复数据删除模型,保证重复数据删除的精度.数据分块时,在可容忍范围内,提出了指纹极值的分块算法,减少了硬分块对重复删除的影响.在多种分块组合下的实验结果表明,与任何传统的单层重复数据删除算法相比,DDFE能够较好地防止硬分块、平衡性能和时间,在大量小数据块和频繁变化的数据间有效地消除更多的重复数据.
In order to improve the deduplication rate of the deduplication algorithm,reduce the forced chunking of CDC,balancing deduplication rate and performance. Thus,double layer deduplication algorithm based on fingerprint extremum( DDFE) is proposed. Firstly,a large chunking size is used in the first layer deduplication model to ensure the speed of deduplication operation; then the reduplicated data of the first layer model import the second layer deduplication model with smaller chunking size to ensure the accuracy of deduplication. During data chunking,in the range of tolerance,chunking algorithm of fingerprint extremum is proposed,which reduces the effect of forced chunking on deduplication. The experimental results on a variety of chunking assemble show that DDFE can effectively prevent forced chunking,balance performance and time,and eliminate more duplicate data between a large number of small data blocks and frequently changing datas compared with any traditional single layer deduplication algorithm.
作者
王青松
葛慧
WANG Qing-song;GE Hui(College of Information,Liaoning University,Shenyang 110036,China)
出处
《辽宁大学学报(自然科学版)》
CAS
2018年第3期201-207,共7页
Journal of Liaoning University:Natural Sciences Edition
基金
国家自然科学基金资助项目(61502215)