期刊文献+

Flexisample:个性化近似聚合查询系统

Flexisample:Personalized Approximate Aggregate Query System
下载PDF
导出
摘要 大数据交互式查询分析对于查询时延具有较高需求,基于采样技术的近似计算服务通过牺牲一定的准确性可以获得较少的查询时延,其在大数据近似查询分析方面具有良好的普适性和广阔的应用前景。论文所述系统Flexisample是一个基于采样技术的个性化近似聚合查询系统,实现了针对查询请求的解析重写和逻辑样本组合策略,使其可以满足个性化的多维聚合查询需求。为了在满足多样个性化聚合查询请求的同时保证一定的准确率,Flexisample维护了一组优化设计后的分层样本,并且为了扩大样本在时间维度上的覆盖范围,系统利用在线数据流对分层样本进行维护与更新。将系统应用于电能质量数据聚合查询,结果表明:针对多个个性化聚合查询请求和查询时延约束,系统可以在满足业务人员个性化查询需求的同时有效降低查询时延,在时间消耗仅为全量查询不足7%的条件下,全部分层的查询准确率均达到了88%以上,样本存储空间相比直接存储减少了87.5%。 Big data interactive query analysis has a high demand for query delay.The approximate computing service based on sampling technology can achieve less query delay by sacrificing certain accuracy.It has a good universality and broad application prospect in the aspect of big data approximate query analysis.The system described in this paper named Flexisample,is a personal⁃ized approximate aggregate query system based on sampling technology,which realizes the analytic rewrite and logical sample com⁃bination strategy for query request,so that it can meet the needs of personalized multidimensional aggregate query.Flexisample maintains an optimized set of layered samples to meet a variety of personalized aggregated query requests while maintaining a degree of accuracy.To extend sample coverage in the time dimension,the system maintains and updates layered samples using online data streams.Applying the system to power quality data aggregation query requirements,the results show that with multiple personalized aggregated query requests and query delay constraints,the system can meet the personalized query requirements of business person⁃nel and effectively reduce the query delay,under the condition that the time consumption is less than 7%of the full query,the que⁃ry accuracy of all layers reaches more than 88%.Meanwhile,the sample storage space required by the system is reduced by 87.5%compared with direct storage.
作者 赵博 左昌麒 房俊 ZHAO Bo;ZUO Changqi;FANG Jun(School of Information,North China University of Technology,Beijing 100144;Beijing Key Laboratory on Integration and Analysis of Large-Scale Stream Data,Beijing 100144)
出处 《计算机与数字工程》 2021年第12期2431-2436,共6页 Computer & Digital Engineering
基金 国家自然科学基金国际(地区)合作与交流项目(编号:62061136006) 国家重点研发计划(编号:2018YFB1402500)资助。
关键词 近似计算 聚合查询 分层采样 样本维护 approximate computing aggregate query stratified sampling sample maintenance
  • 相关文献

参考文献6

二级参考文献32

  • 1陈勇旭,陈梦杰,刘雪冰,宋杰.基于MapReduce的连接聚集查询算法研究[J].计算机研究与发展,2013,50(S1):306-311. 被引量:7
  • 2Herodotou H, Lim H, Luo Get al. Starfish: A self-tuning system for big data analytics. In Proc. the 15th CIDR, Apr. 2011, pp.261-272. 被引量:1
  • 3Wu S, Ooi B C, Tan K L. Continuous sampling for online aggregation over multiple queries. In Proc. the 2010 Interna- tional Conference on Management of Data ( SIGMOD), June 2010, pp.651-662. 被引量:1
  • 4Chaudhuri S, Das G, Datar Met al. Overcoming limitations of sampling for aggregation queries. In Proc. the 17th Int. Conf. Data Engineering, Apr. 2001, pp.534-544. 被引量:1
  • 5Laptev N, Zeng K, Zaniolo C. Early accurate results for ad- vanced analytics on MapReduce. PVLDB, 2012, 5(10): 1028- 1039. 被引量:1
  • 6Hellerstein J M, Haas P J, Wang H J. Online aggregation. ACM SIGMOD Record., 1997, 26(2): 171-182. 被引量:1
  • 7Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008, 51(1): 107-113. 被引量:1
  • 8Borkar V, Carey M, Grover R et al. Hyracks: A flexible and extensible foundation for data-intensive computing. In Proc. the 27th International Conference on Data Engineering, Apr. 2011, pp.1151-1162. 被引量:1
  • 9Pansare N, Borkar V R, Jermaine C et al. Online aggregation for large MapReduce jobs. PVLDB, 2011, 4(11): 1135-1145. 被引量:1
  • 10Bose J H, Andrzejak A, Hogqvist M. Beyond online aggrega- tion: Parallel and incremental data mining with online map- reduce. In Proc. MDAC, Apr. 2010, Article No.3. 被引量:1

共引文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部