摘要
大数据交互式查询分析对于查询时延具有较高需求,基于采样技术的近似计算服务通过牺牲一定的准确性可以获得较少的查询时延,其在大数据近似查询分析方面具有良好的普适性和广阔的应用前景。论文所述系统Flexisample是一个基于采样技术的个性化近似聚合查询系统,实现了针对查询请求的解析重写和逻辑样本组合策略,使其可以满足个性化的多维聚合查询需求。为了在满足多样个性化聚合查询请求的同时保证一定的准确率,Flexisample维护了一组优化设计后的分层样本,并且为了扩大样本在时间维度上的覆盖范围,系统利用在线数据流对分层样本进行维护与更新。将系统应用于电能质量数据聚合查询,结果表明:针对多个个性化聚合查询请求和查询时延约束,系统可以在满足业务人员个性化查询需求的同时有效降低查询时延,在时间消耗仅为全量查询不足7%的条件下,全部分层的查询准确率均达到了88%以上,样本存储空间相比直接存储减少了87.5%。
Big data interactive query analysis has a high demand for query delay.The approximate computing service based on sampling technology can achieve less query delay by sacrificing certain accuracy.It has a good universality and broad application prospect in the aspect of big data approximate query analysis.The system described in this paper named Flexisample,is a personal⁃ized approximate aggregate query system based on sampling technology,which realizes the analytic rewrite and logical sample com⁃bination strategy for query request,so that it can meet the needs of personalized multidimensional aggregate query.Flexisample maintains an optimized set of layered samples to meet a variety of personalized aggregated query requests while maintaining a degree of accuracy.To extend sample coverage in the time dimension,the system maintains and updates layered samples using online data streams.Applying the system to power quality data aggregation query requirements,the results show that with multiple personalized aggregated query requests and query delay constraints,the system can meet the personalized query requirements of business person⁃nel and effectively reduce the query delay,under the condition that the time consumption is less than 7%of the full query,the que⁃ry accuracy of all layers reaches more than 88%.Meanwhile,the sample storage space required by the system is reduced by 87.5%compared with direct storage.
作者
赵博
左昌麒
房俊
ZHAO Bo;ZUO Changqi;FANG Jun(School of Information,North China University of Technology,Beijing 100144;Beijing Key Laboratory on Integration and Analysis of Large-Scale Stream Data,Beijing 100144)
出处
《计算机与数字工程》
2021年第12期2431-2436,共6页
Computer & Digital Engineering
基金
国家自然科学基金国际(地区)合作与交流项目(编号:62061136006)
国家重点研发计划(编号:2018YFB1402500)资助。
关键词
近似计算
聚合查询
分层采样
样本维护
approximate computing
aggregate query
stratified sampling
sample maintenance