数据流复杂查询处理的研究

Research on Process of Complex Queries over Data Stream

下载PDF

导出

摘要在连续的数据流上提供查询的应答对很多应用环境来说是一个极为重要的需求。本文主要探索了如何使用有限的内存在数据流上进行聚集SQL查询,以获得近似的结果。使用随机草图技术,计算非常小的数据流草图,以获得聚集查询的近似结果,并保证误差能在一定的范围之内。并讨论了在草图方法中如何利用已有的直方图统计信息来提高应答的质量。其关键的思想就是对属性域进行智能化的划分,分解草图化问题,确保所获得查询的结果具有合适的近似精度。不论从理论还是实验上都可以证明草图提供的聚集查询结果比传统的直方图更有效、更精确。 Providing answers to queries over continuous data streams is a very important requirement for many application environments. In this paper,we explore primarily how to obtain approximate results of aggregate SQL queries over data streams with limited memory. By utilizing randomizing techniques to compute very small sketch synopses of the streams, approximate answers can be provided to aggregate queries with provable guarantees on the approximation error. We also discuss how existing statistical information based on histograms can be used in the sketch method to improve the quality of the answers. The key idea is to intelligently partition the domain of the attributes,decompose the sketching problem and obtain the results of the queries with reasonable guarantees on the quality of approximation. In theory as well as experiment, it has proved that sketches provide significantly more accurate and effective answers of aggregate queries compared to traditional histograms.

作者魏定国吴时霖

机构地区复旦大学计算机科学与工程系广东商学院

出处《计算机科学》 CSCD 北大核心 2004年第2期61-65,共5页 Computer Science

关键词数据库管理系统数据流数据查询数据处理数据集数据元组 DBMS Aggregate query,Data stream,Sketch

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献10

1[1]Arasu A,et al. Characterizing memory requirements for queries over continuous data streams.In:Proc. 21st ACM SIGACT-SIGMOD-SIGART Symp on Principles of Database Systems,Madison,Wisconsin,May 2002.221～232 被引量：1
2[2]Gilbert A,et al.Fast,small-space algorithms for approximate histogram maintenance. In:Proc. of the 2002 Annual ACM Symp. on Theory of Computing,2002 被引量：1
3[3]Alon N,Matias Y,Szegedy M. The Space Complexity of Approximating the Frequency Moments. In:Proc. of the 28th Annual ACM Symp. on the Theory of Computing,May 1996 被引量：1
4[4]Alon N,Gibbons P B,Matias Y,Szegedy M. Tracking Join and Self-Join Sizes in Limited Storage. In:Proc. of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems,May 1999 被引量：1
5[5]Alon N,Matias Y, Szegedy M. The Space Complexity of Approximating the Frequency Moments. In:Proc. of the 28thAnnual ACM Symp. on the Theory of Computing,May 1996 被引量：1
6[6]Babcock B,et al.Models and issues in data stream systems. In:Proc. 21st ACM SIGACT-SIGMOD-SIGART Symp. on Pinciples of Database Systems,Madison,Wisconsin,May 2002.1～16 被引量：1
7[7]Motwani R,Widom J,et al. Query processing,approximation,and resource management in a data stream management system. In:Proc. First Biennial Conf. on Innovative Data Systems Research(CIDR),Jan.2003 被引量：1
8[8]Guha S,Koudas N. Approximating a data stream for querying and estimation: Algorithms and performance evaluation. In:Proc. of the 2002 Intl. Conf. on Data Engineering,2002.567～576 被引量：1
9[9]Dobra A ,Garofalakis M. Processing Complex Aggregate Queries over Data Streams. ACM SIGMOD June 2002 被引量：1
10[10]Chandrasekaran S,Franklin M. Streaming queries over streaming data. In:Proc. 28th Intl. Conf. on Very Large Data Bases,Aug.2002 被引量：1

1郑祺,黄德才.基于引力相似度和相对密度的不确定数据流聚类[J].上海交通大学学报,2016,50(6):873-878. 被引量：5
2何丽娟,周鸣争,陶皖,江自兵.无线传感器网络中不确定数据的估计算法[J].计算机工程与应用,2011,47(28):100-102. 被引量：3
3刘斐,樊华,金松昌,贾焰.一种新型k匿名隐私保护算法[J].信息网络安全,2012(8):199-202. 被引量：2
4周傲英,周敏奇,钱卫宁,张蓉.大规模分布式系统中的多属性查询处理[J].计算机学报,2008,31(9):1563-1572. 被引量：11
5田海生.数据流管理系统中Max、Min聚集算子的示例概要算法[J].计算机应用,2008,28(8):1986-1990.
6闫新庆,尹周平,熊有伦.无线射频识别系统中的事件处理机制[J].华中科技大学学报（自然科学版）,2008,36(9):63-66. 被引量：5
7余敏,李战怀,张龙波.基于模式P2P系统复杂查询处理研究[J].计算机应用研究,2007,24(7):81-85. 被引量：1
8程转流,胡为成.基于直方图的概率数据流聚类算法[J].铜陵学院学报,2010,9(2):73-75.
9余敏,李战怀,张龙波.P2P数据管理[J].软件学报,2006,17(8):1717-1730. 被引量：17
10程转流,胡为成.滑动窗口模型下的概率数据流聚类[J].计算机工程与应用,2011,47(4):141-145. 被引量：2

计算机科学

2004年第2期

浏览历史

内容加载中请稍等...

数据流复杂查询处理的研究

参考文献10

相关作者

相关机构

相关主题

浏览历史