The appearance and wide use of memory hardware bring significant changes to the conventional vertical memory hierarchy that fails to handle contentions for shared hardware resources and expensive data movements.To dea...The appearance and wide use of memory hardware bring significant changes to the conventional vertical memory hierarchy that fails to handle contentions for shared hardware resources and expensive data movements.To deal with these problems,existing schemes have to rely on inefficient scheduling strategies that also cause extra temporal,spatial and bandwidth overheads.Based on the insights that the shared hardware resources trend to be uniformly and hierarchically offered to the requests for co-located applications in memory systems,we present an efficient abstraction of memory hierarchies,called Label,which is used to establish the connection between the application layer and underlying hardware layer.Based on labels,our paper proposes LaMem,a labeled,resource-isolated and cross-tiered memory system by leveraging the way-based partitioning technique for shared resources to guarantee QoS demands of applications,while supporting fast and low-overhead cache repartitioning technique.Besides,we customize LaMem for the learned index that fundamentally replaces storage structures with computation models as a case study to verify the applicability of LaMem.Experimental results demonstrate the efficiency and efficacy of LaMem.展开更多
在大数据时代,数据访问速度是衡量大规模存储系统性能的一个重要指标,而索引是用于提升数据库系统中数据存取性能的主要技术之一。近几年,使用机器学习模型代替B+树等传统索引,拟合数据分布规律,将数据的间接查找优化为函数直接计算的...在大数据时代,数据访问速度是衡量大规模存储系统性能的一个重要指标,而索引是用于提升数据库系统中数据存取性能的主要技术之一。近几年,使用机器学习模型代替B+树等传统索引,拟合数据分布规律,将数据的间接查找优化为函数直接计算的学习索引(Learned Index,LI)被提出,LI提高了查询的速度,减少了索引空间开销。但是LI的拟合误差较大,不支持插入等修改性操作。文中提出了一种利用梯度下降算法拟合数据的学习索引模型GDLIN(A Learned Index By Gradient Descent)。GDLIN利用梯度下降算法更好地拟合数据,减少拟合误差,缩短本地查找的时间;同时递归调用数据拟合算法,充分利用键的分布规律,构建上层结构,避免索引结构随着数据量而增大。另外,GDLIN利用链表解决LI不支持数据插入的问题。实验结果表明,GDLIN在无新数据插入的情况下,吞吐量是B+树的2.1倍;在插入操作占比为50%的情况下,是LI的1.08倍。展开更多
We propose an approach to underpin interactive visual exploration of large data volumes by training Learned Visualization Index(LVI).Knowing in advance the data,the aggregation functions that are used for visualizatio...We propose an approach to underpin interactive visual exploration of large data volumes by training Learned Visualization Index(LVI).Knowing in advance the data,the aggregation functions that are used for visualization,the visual encoding,and available interactive operations for data selection,LVI allows to avoid time-consuming data retrieval and processing of raw data in response to user’s interactions.Instead,LVI directly predicts aggregates of interest for the user’s data selection.We demonstrate the efficiency of the proposed approach in application to two use cases of spatio-temporal data at different scales.展开更多
The recently proposed learned index has higher query performance and space efficiency than the conventional B+-tree.However,the original learned index has the problems of insertion failure and unbounded query complexi...The recently proposed learned index has higher query performance and space efficiency than the conventional B+-tree.However,the original learned index has the problems of insertion failure and unbounded query complexity,meaning that it supports neither insertions nor bounded query complexity.Some variants of the learned index use an out-of-place strategy and a bottom-up build strategy to accelerate insertions and support bounded query complexity,but introduce additional query costs and frequent node splitting operations.Moreover,none of the existing learned indices are cache-friendly.In this paper,aiming to not only support efficient queries and insertions but also offer bounded query complexity,we propose a new learned index called COLIN(Cache-cOnscious Learned INdex).Unlike previous solutions using an out-of-place strategy,COLIN adopts an in-place approach to support insertions and reserves some empty slots in a node to optimize the node’s data placement.In particular,through model-based data placement and cache-conscious data layout,COLIN decouples the local-search boundary from the maximum error of the model.The experimental results on five workloads and three datasets show that COLIN achieves the best read/write performance among all compared indices and outperforms the second best index by 18.4%,6.2%,and 32.9%on the three datasets,respectively.展开更多
基金supported in part by National Natural Science Foundation of China(62125202).
文摘The appearance and wide use of memory hardware bring significant changes to the conventional vertical memory hierarchy that fails to handle contentions for shared hardware resources and expensive data movements.To deal with these problems,existing schemes have to rely on inefficient scheduling strategies that also cause extra temporal,spatial and bandwidth overheads.Based on the insights that the shared hardware resources trend to be uniformly and hierarchically offered to the requests for co-located applications in memory systems,we present an efficient abstraction of memory hierarchies,called Label,which is used to establish the connection between the application layer and underlying hardware layer.Based on labels,our paper proposes LaMem,a labeled,resource-isolated and cross-tiered memory system by leveraging the way-based partitioning technique for shared resources to guarantee QoS demands of applications,while supporting fast and low-overhead cache repartitioning technique.Besides,we customize LaMem for the learned index that fundamentally replaces storage structures with computation models as a case study to verify the applicability of LaMem.Experimental results demonstrate the efficiency and efficacy of LaMem.
文摘在大数据时代,数据访问速度是衡量大规模存储系统性能的一个重要指标,而索引是用于提升数据库系统中数据存取性能的主要技术之一。近几年,使用机器学习模型代替B+树等传统索引,拟合数据分布规律,将数据的间接查找优化为函数直接计算的学习索引(Learned Index,LI)被提出,LI提高了查询的速度,减少了索引空间开销。但是LI的拟合误差较大,不支持插入等修改性操作。文中提出了一种利用梯度下降算法拟合数据的学习索引模型GDLIN(A Learned Index By Gradient Descent)。GDLIN利用梯度下降算法更好地拟合数据,减少拟合误差,缩短本地查找的时间;同时递归调用数据拟合算法,充分利用键的分布规律,构建上层结构,避免索引结构随着数据量而增大。另外,GDLIN利用链表解决LI不支持数据插入的问题。实验结果表明,GDLIN在无新数据插入的情况下,吞吐量是B+树的2.1倍;在插入操作占比为50%的情况下,是LI的1.08倍。
基金National Key R&D Program of China(2018YFC0831700)NSFC project(61972278)+1 种基金Natural Science Foundation of Tianjin(20JCQNJC01620)the Browser Project(CEIEC-2020-ZM02-0132).
文摘We propose an approach to underpin interactive visual exploration of large data volumes by training Learned Visualization Index(LVI).Knowing in advance the data,the aggregation functions that are used for visualization,the visual encoding,and available interactive operations for data selection,LVI allows to avoid time-consuming data retrieval and processing of raw data in response to user’s interactions.Instead,LVI directly predicts aggregates of interest for the user’s data selection.We demonstrate the efficiency of the proposed approach in application to two use cases of spatio-temporal data at different scales.
基金the National Natural Science Foundation of China under Grant No.62072419the Huawei-USTC Joint Innovation Project on Fundamental System Software。
文摘The recently proposed learned index has higher query performance and space efficiency than the conventional B+-tree.However,the original learned index has the problems of insertion failure and unbounded query complexity,meaning that it supports neither insertions nor bounded query complexity.Some variants of the learned index use an out-of-place strategy and a bottom-up build strategy to accelerate insertions and support bounded query complexity,but introduce additional query costs and frequent node splitting operations.Moreover,none of the existing learned indices are cache-friendly.In this paper,aiming to not only support efficient queries and insertions but also offer bounded query complexity,we propose a new learned index called COLIN(Cache-cOnscious Learned INdex).Unlike previous solutions using an out-of-place strategy,COLIN adopts an in-place approach to support insertions and reserves some empty slots in a node to optimize the node’s data placement.In particular,through model-based data placement and cache-conscious data layout,COLIN decouples the local-search boundary from the maximum error of the model.The experimental results on five workloads and three datasets show that COLIN achieves the best read/write performance among all compared indices and outperforms the second best index by 18.4%,6.2%,and 32.9%on the three datasets,respectively.