期刊文献+

数据流复杂查询处理的研究

Research on Process of Complex Queries over Data Stream
下载PDF
导出
摘要 在连续的数据流上提供查询的应答对很多应用环境来说是一个极为重要的需求。本文主要探索了如何使用有限的内存在数据流上进行聚集SQL查询,以获得近似的结果。使用随机草图技术,计算非常小的数据流草图,以获得聚集查询的近似结果,并保证误差能在一定的范围之内。并讨论了在草图方法中如何利用已有的直方图统计信息来提高应答的质量。其关键的思想就是对属性域进行智能化的划分,分解草图化问题,确保所获得查询的结果具有合适的近似精度。不论从理论还是实验上都可以证明草图提供的聚集查询结果比传统的直方图更有效、更精确。 Providing answers to queries over continuous data streams is a very important requirement for many application environments. In this paper,we explore primarily how to obtain approximate results of aggregate SQL queries over data streams with limited memory. By utilizing randomizing techniques to compute very small sketch synopses of the streams, approximate answers can be provided to aggregate queries with provable guarantees on the approximation error. We also discuss how existing statistical information based on histograms can be used in the sketch method to improve the quality of the answers. The key idea is to intelligently partition the domain of the attributes,decompose the sketching problem and obtain the results of the queries with reasonable guarantees on the quality of approximation. In theory as well as experiment, it has proved that sketches provide significantly more accurate and effective answers of aggregate queries compared to traditional histograms.
出处 《计算机科学》 CSCD 北大核心 2004年第2期61-65,共5页 Computer Science
关键词 数据库管理系统 数据流 数据查询 数据处理 数据集 数据元组 DBMS Aggregate query,Data stream,Sketch
  • 相关文献

参考文献10

  • 1[1]Arasu A,et al. Characterizing memory requirements for queries over continuous data streams.In:Proc. 21st ACM SIGACT-SIGMOD-SIGART Symp on Principles of Database Systems,Madison,Wisconsin,May 2002.221~232 被引量:1
  • 2[2]Gilbert A,et al.Fast,small-space algorithms for approximate histogram maintenance. In:Proc. of the 2002 Annual ACM Symp. on Theory of Computing,2002 被引量:1
  • 3[3]Alon N,Matias Y,Szegedy M. The Space Complexity of Approximating the Frequency Moments. In:Proc. of the 28th Annual ACM Symp. on the Theory of Computing,May 1996 被引量:1
  • 4[4]Alon N,Gibbons P B,Matias Y,Szegedy M. Tracking Join and Self-Join Sizes in Limited Storage. In:Proc. of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems,May 1999 被引量:1
  • 5[5]Alon N,Matias Y, Szegedy M. The Space Complexity of Approximating the Frequency Moments. In:Proc. of the 28thAnnual ACM Symp. on the Theory of Computing,May 1996 被引量:1
  • 6[6]Babcock B,et al.Models and issues in data stream systems. In:Proc. 21st ACM SIGACT-SIGMOD-SIGART Symp. on Pinciples of Database Systems,Madison,Wisconsin,May 2002.1~16 被引量:1
  • 7[7]Motwani R,Widom J,et al. Query processing,approximation,and resource management in a data stream management system. In:Proc. First Biennial Conf. on Innovative Data Systems Research(CIDR),Jan.2003 被引量:1
  • 8[8]Guha S,Koudas N. Approximating a data stream for querying and estimation: Algorithms and performance evaluation. In:Proc. of the 2002 Intl. Conf. on Data Engineering,2002.567~576 被引量:1
  • 9[9]Dobra A ,Garofalakis M. Processing Complex Aggregate Queries over Data Streams. ACM SIGMOD June 2002 被引量:1
  • 10[10]Chandrasekaran S,Franklin M. Streaming queries over streaming data. In:Proc. 28th Intl. Conf. on Very Large Data Bases,Aug.2002 被引量:1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部