期刊文献+

指纹极值的双层重复数据删除算法

Double Layer Deduplication Algorithm Based on Fingerprint Extremum
下载PDF
导出
摘要 为提高重复数据删除算法的重删率,减少CDC算法边界硬分块,使重复数据删除率和性能之间得到平衡,提出了指纹极值的双层重复数据删除算法(DDFE).首先在第一层重复数据删除模型中使用较大的分块大小,保证重删操作的速度;然后将第一层模型重删后的不重复数据输入到分块大小较小的第二层重复数据删除模型,保证重复数据删除的精度.数据分块时,在可容忍范围内,提出了指纹极值的分块算法,减少了硬分块对重复删除的影响.在多种分块组合下的实验结果表明,与任何传统的单层重复数据删除算法相比,DDFE能够较好地防止硬分块、平衡性能和时间,在大量小数据块和频繁变化的数据间有效地消除更多的重复数据. In order to improve the deduplication rate of the deduplication algorithm,reduce the forced chunking of CDC,balancing deduplication rate and performance. Thus,double layer deduplication algorithm based on fingerprint extremum( DDFE) is proposed. Firstly,a large chunking size is used in the first layer deduplication model to ensure the speed of deduplication operation; then the reduplicated data of the first layer model import the second layer deduplication model with smaller chunking size to ensure the accuracy of deduplication. During data chunking,in the range of tolerance,chunking algorithm of fingerprint extremum is proposed,which reduces the effect of forced chunking on deduplication. The experimental results on a variety of chunking assemble show that DDFE can effectively prevent forced chunking,balance performance and time,and eliminate more duplicate data between a large number of small data blocks and frequently changing datas compared with any traditional single layer deduplication algorithm.
作者 王青松 葛慧 WANG Qing-song;GE Hui(College of Information,Liaoning University,Shenyang 110036,China)
出处 《辽宁大学学报(自然科学版)》 CAS 2018年第3期201-207,共7页 Journal of Liaoning University:Natural Sciences Edition
基金 国家自然科学基金资助项目(61502215)
关键词 重复数据删除 指纹极值 备份系统 Hadoop 数据存储 deduplication Fingerprint extremum standby system Hadoop data storage
  • 相关文献

参考文献3

二级参考文献32

  • 1付印金,肖侬,刘芳,鲍先强.基于重复数据删除的虚拟桌面存储优化技术[J].计算机研究与发展,2012,49(S1):125-130. 被引量:12
  • 2ZHU B, LI K, PATTERSON H. Avoiding the disk bottleneck in the data domain deduplication file system[A]. Proceedings of the 6th USENIX Conference on File and Storage Technologies, USENIX As- sociation[C]. 2008,1-14. 被引量:1
  • 3LILLIBRIDGE M, ESHGHI K, BHAGWAT D, et aL Sparse indexing: large scale, inline deduplication using sampling and locality[A]. Proc- eedings of the 7th Conference on File and Storage Technologies, USENIX Association[C]. 2009. 111-123. 被引量:1
  • 4BHAGWAT D, ESHGHI K, LONG D, et al. Extreme binning: scalable, parallel deduplication for chunk-based file backup[A]. In Modeling, Analysis & Simulation of Computer and Telecommunication Systems, IEEE International Symposium[C]. IEEE, 2009,1-9. 被引量:1
  • 5XIA W, JIANG H, FENG D, et al. SiLo: a similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput[A]. Proceedings of the 2011 USENIX Annual Technical Conference (ATC), USENIX Association[C],2011,26-28. 被引量:1
  • 6ARONOVICH L, ASHER R, BACHMAT E, et al. The design of a similar- ity based deduplication system[A]. Proceedings of SYSTOR 2009, The Is- raeli Experimental Systems Conference[C]. ACM, 2009. 1-14. 被引量:1
  • 7ROMAIQSK1 B, HELDT L, KILIAN W, et al. Anchor-driven sub- chunk deduplication[A]. Proceedings of the 4th Annual International Conference on Systems and Storage[C]. 201 l. 16-28. 被引量:1
  • 8ZHANG Z, BHAGWAT D, LITWIN W, et al. Improved deduplication through parallel binning[A]. Performance Computing and Communications Conference (IPCCC), 2012 IEEE 31st International[C]. 2012. 130-141. 被引量:1
  • 9DOUGLIS F, IYENGAR A. Application-specific deltaencoding via resemblance detection[A]. Proceedings of the 2003 USENIX Annual Technical Conference[C]. San Antonio, Texas, 2003. 113-126. 被引量:1
  • 10BRODER A Z, MITZENMACHER M. Network applications of Bloom filters: a survey[J]. Interact Mathematics, 2004, 1(4): 485-509. 被引量:1

共引文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部