摘要
近年来,XML已成为Web上信息交流和资源共享的主要载体。但XML自身的自冗余特性限制了它的普遍应用。目前,已经有研究成果提出了XML的压缩方法。压缩的XML文档能够有效利用存储空间,节省网络带宽。在实际应用中,经常需要对压缩存储的XML文档进行更新。对于大的压缩文档,如果先解压再更新,会消耗大量时间,因此,高效的更新方法应该避免解压缩文档,在压缩的XML文档上直接进行更新操作。本文针对压缩XML文档中的数值类型(包括整型和浮点型)数据,研究了在保持压缩状态条件下如何进行有效的数值更新,提出了基于XPRESS实现的Nave数值更新方法,以及修改XPRESS编码方法实现的更为高效的Pivot数值更新方法。通过大量的实验证明,Pivot数值更新方法不仅能够提供高效的更新处理,而且保持了XPRESS的高压缩率。
XML has become the de-facto standard for exchanging information on theWeb. However, XML data is recognized as verbose since its heavily repeated tags introduce significant redundancy. In order to save disk space and network bandwidth, a variety of compressing methods have been presented. Practically, Query and Update operations are two most frequently used operations. Efficient Update methods are required if there is a need to modify stored compressed XML data. In this paper, we focus on update problem of numeric data in compressed XML. Firstly, we make formal definition and classification of update types of numeric data. Secondly, we show major challenges and bottlenecks when dealing with the problem. Then, a naive update method for compressed XML data using XPRESS approach is presented. In order to improve performance, a novel method - Pivot method is designed. Experiment results with DBLP data set show that the Pivot method achieves better performance yet not comprising on compression ratio.
出处
《计算机科学》
CSCD
北大核心
2007年第4期106-107,144,共3页
Computer Science
基金
Sybase项目资助