期刊文献+

云环境下的Max/Min在线聚集技术研究

Max / Min Online Aggregation in the Cloud
下载PDF
导出
摘要 数据探索作为数据分析的一个重要环节,必须能够高效的获取数据集的关键性指标,比如最大/最小值、均值等.关系数据库中这些指标可以通过SQL语句的聚集函数得到.为了实现海量数据下的高效聚集,关系数据库领域学者提出了在线聚集.在大数据时代,云环境下的在线聚集技术开始得到重视.但是目前云环境下的在线聚集研究基本是针对Count、Sum等聚集函数,尚未有针对Max/Min在线聚集的研究.本文利用切比雪夫不等式和中心极限定理,通过分位数来衡量Max/Min在线聚集的精确度.实验证明,该方法能够很好的适应大数据环境下的在线聚集,并具有良好的扩展性. As an important part of data analysis, data exploration must be able to efficiently access key indicators of data sets, such as max/min, average and etc. These indicators can be obtained by SQL aggregate functions in relational database. In order to achieve this goal in massive dataset, scholars have proposed the concept of onlineaggregation. In the era of big data, online aggregation in the cloud has attracted attentions. Most of the research focuses on the aggregation function such as Count, Sum and other aggregate functions, while there is little works on the Max/Min online aggregation now. In this paper, we use quantile to measure the accuracy of Max/Min online aggregation which induced by chebyshev's inequality and central limit theorem. The experimental results demonstrate the efficiency of the method and it can well adapt to online aggregation for big data.
出处 《小型微型计算机系统》 CSCD 北大核心 2015年第10期2177-2182,共6页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61379050 91224008)资助 国家"八六三"高技术研究发展计划项目(2013AA013204)资助 高等学校博士学科点专项科研基金课题项目(20130004130001)资助 中国人民大学科学研究基金项目(11XNL010)资助
关键词 在线聚集 云计算 切比雪夫不等式 中心极限定理 online aggregation cloud computing chebyshev' s inequality central limit theorem
  • 相关文献

参考文献14

  • 1Joseph M Hellerstein, Peter J Hass, Helen J Wang. Online aggrega- tion[ C ]. Proceedings of ACM Conference on Management of Da- m, New York: ACM, 1997 : 171 - 182. 被引量:1
  • 2Peter J Haas. Large-sample and deterministic confidence intervals for online aggregation [ C ]. Proceedings of International Conference on Scientific and Statistical DB Management, Piscataway, NJ: IEEE, 1997:51-63. 被引量:1
  • 3Peter J Haas, Joseph M Hellerstein. Ripple joins for online aggrega- tion [ C ]. Proc of SIGMOD 1999, New York: ACM, 1999:287-298. 被引量:1
  • 4Gang Luo, Curt J Ellmann, Peter J Haas, et al. A scalable hash rip- ple join algorithm[ C]. Proceedings of ACM Conference On Man- agement of Data, New York : ACM,2005:252-262. 被引量:1
  • 5Chris Jermaine, Alin Dobra, Subramanian Arumugam, et al. A disk- based join with probabilistie guarantees [ C ]. Proceedings of ACM Conference on Management of Data,New York:ACM,2005:56.3-574. 被引量:1
  • 6Wu Sai, Jiang Shou-xu, Beng Chin Ooi, et al. Distributed online ag- gregation [ J ]. The Proceedings of the VLDB Endowment, 2009,2 ( 1 ) :443-454. 被引量:1
  • 7Tyson Condie ,Nell Conway ,Peter Alvaro,et al. Online aggregationand continuous query support in Mapreduce [ C ]. ProceeAings of ACM Conference on Management of Data, New York: ACM ,2010 : 1115-1118. 被引量:1
  • 8Shi Ying-jie, Meng Xiao-feng, Wang Fu-sheng, et al. You can stop early with COLA:online processing of aggregate queries in the cloud[ C ]. Proceedings of ACM International Conference on Infor- mation and Knowledge Management, New York: ACM,2012 : 1223- 1232. 被引量:1
  • 9COLA [ EB/OL ]. http ://idke. inc. edu. cn/COLA/,2014. 被引量:1
  • 10Niketan Pans~re, Vinayak R Borkar, Chris Jermaine, et al. Online aggregation for large mapreduceJobs [ J ]. The Proceedings of the VLDB Endowment ,2011,4 ( 11 ) : 1135-1145. 被引量:1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部