摘要
在连续的数据流上提供查询的应答对很多应用环境来说是一个极为重要的需求。本文主要探索了如何使用有限的内存在数据流上进行聚集SQL查询,以获得近似的结果。使用随机草图技术,计算非常小的数据流草图,以获得聚集查询的近似结果,并保证误差能在一定的范围之内。并讨论了在草图方法中如何利用已有的直方图统计信息来提高应答的质量。其关键的思想就是对属性域进行智能化的划分,分解草图化问题,确保所获得查询的结果具有合适的近似精度。不论从理论还是实验上都可以证明草图提供的聚集查询结果比传统的直方图更有效、更精确。
Providing answers to queries over continuous data streams is a very important requirement for many application environments. In this paper,we explore primarily how to obtain approximate results of aggregate SQL queries over data streams with limited memory. By utilizing randomizing techniques to compute very small sketch synopses of the streams, approximate answers can be provided to aggregate queries with provable guarantees on the approximation error. We also discuss how existing statistical information based on histograms can be used in the sketch method to improve the quality of the answers. The key idea is to intelligently partition the domain of the attributes,decompose the sketching problem and obtain the results of the queries with reasonable guarantees on the quality of approximation. In theory as well as experiment, it has proved that sketches provide significantly more accurate and effective answers of aggregate queries compared to traditional histograms.
出处
《计算机科学》
CSCD
北大核心
2004年第2期61-65,共5页
Computer Science