摘要
由于无损备份会记录每次变更的数据,因此产生的备份数据量非常大,导致备份过程过长,使得数据存储效果并不理想。为了优化大数据背景下的数据存储方式,提出考虑带宽限制的无损数据库分布式增量备份。引入相似度计算原理,提取数据库内缺陷数据的相似属性邻近数据,结合自适应多级决策树优化(Group Method of Data Handling,GMDH)算法构建最优复杂度计算结构,对缺陷数据插补。通过无损数据压缩(Lempel-Ziv-Welch,LZW)算法,压缩插补后的数据。用不同维度的数据特征向量描述数据的类别,结合重采样(Bootstrap resampling,Bootstrap)算法与概率理论判断数据类别,将不同类别数据备份至增量备份树的不同分支内,在数据更新条件下,通过增量备份树分支节点数据的查询,实现非重复性数据的增量备份。实验表明,所提方法能够在低带宽占用条件下,实现数据的高效增量备份,对应用数据的保护具有重要意义。
Since lossless backup records the data changed every time,the amount of backup data generated is very large,resulting in a long backup process,which makes the data storage effect unsatisfactory.In order to optimize the data storage mode in the context of big data,a lossless database distributed incremental backup considering bandwidth constraints is proposed.The similarity calculation principle is introduced to extract the similar attribute adjacent data of defect data in the database,and the adaptive multi-level decision tree optimization(Group Method of Data Handling,GMDH)algorithm is combined to build the optimal complexity calculation structure to interpolate defect data.The interpolated data is compressed through the lossless data compression(Lempel-Ziv-Welch,LZW)algorithm.The data feature vectors of different dimensions are used to describe the categories of data,and the data categories are judged by combining the resampling(Bootstrap resampling,Bootstrap)algorithm and probability theory.The data of different categories are backed up to different branches of the incremental backup tree.Under the condition of data update,the incremental backup of nonrepetitive data is realized by querying the branch node data of the incremental backup tree.Experiments show that the proposed method can achieve efficient incremental backup of data under the condition of low bandwidth occupation,which is of great significance to the protection of application data.
作者
曹德胜
程刚
徐帮树
CAO De-sheng;CHENG Gang;XU Bang-shu(School of Computer Science North China Institute of Science and Technology,Langfang Hebei 101601,China;School of Qilu Transportation,Shandong University,Jinan Shandong 250061,China)
出处
《计算机仿真》
2024年第10期328-332,共5页
Computer Simulation
基金
国家自然科学基金项目(42377200)。
关键词
相似度计算
最优复杂度
增量备份树
Similarity calculation
Optimal complexity
LZW algorithm
Bootstrap algorithm
Incremental backup tree