摘要
如何有效、及时地检测和抽取信息源的增量数据是数据仓库及各种数据集成的首要问题,而对于简单的数据源通常用比较数据源两个时刻的快照的方法来检测增量数据。本文从传统Sort Merge快照差分算法代价和效率入手,分析提升其效率和速度的可能方法,并提出基于变异的M5的Sort Merge算法,有效减少比较的数据量和输入输出的数据量,显著的提高了算法的效率。
Detecting and extracting modification from information sources efficiently and timely is a key part of data warehousing and other data integrating. For unsophisticated sources, periodically comparing the snapshots of the data source is the usual way of detecting modifications. We recommend the possible ways of improving the original Sort Merge snapshot differential algorithm by analysis its speed and costs, and propose a new algorithm using a varied MD5 algorithm to compress its contents, which efficiently reduces the IO costs and faster speed.
出处
《微计算机应用》
2010年第12期1-7,共7页
Microcomputer Applications