摘要
大数据背景下剧增的数据给经典的内存索引技术带来了巨大挑战,为了实现对海量数据的高性能索引,工业界和学术界分别从设备和结构角度推出了高性能大容量的非易失型内存(Non-Volatile Memory,NVM)和受机器学习启发的学习索引(Learned Index,LI).然而目前基于NVM的学习索引结构的相关研究非常稀少,在如何结合NVM和LI来高效地索引海量数据方面还有许多问题需要解决.本文提出了一种基于NVM的新型智能索引结构LI-Tree,充分发挥了两者的优势.具体的,LI-Tree可分为三层:由机器学习模型组成的能够提高LI-Tree单点性能的模型层、由静态数组构成的减少NVM写的数据索引层和由一系列轻量级B+树组成以避免模型层插入时频繁重训练的数据层.在真实设备上评估表明,LI-Tree相比传统B+树,插入、查询和删除性能分别提高了70%、30%和130%.另外,LI-Tree与学习索引结构ALEX,PGM-Index和XIndex对比,插入性能分别提升了80%,130%和150%.
The dramatic growth of data volumes in the era of big data poses challenges to in-memory indexing techniques in terms of both devices and data structures.Commercial Non-Volatile Memory(NVM)devices are now available as the supplement of or alternative for memory,and the new Learned Index(LI)data structures are proposed for indexing massive data in read-heavy scenarios.However,naively applying LI on NVMs is not efficient due to the heavy retraining overhead for LI when data size growing.In this work,we present the LI-tree,an NVM-oriented learned index structure that maximizes LI′s write performance while fully exploiting LI′s read performance.Specifically,the LI-Tree consists of three layers:a model layer composed of machine learning models that target on improve point query performance,a data index layer consists of static arrays to reduce NVM write operations,and a data layer that uses lightweight B+trees to alleviate the training overhead of the other two layers for insert operations.Evaluation results on real NVM devices show that LI-Tree improves the performance of insert,query,and delete operations by 70%,30%,and 130%,respectively,compared to the traditional B+tree.LI-Tree improves the insert performance by 80%,130%,and 150%when compared to ALEX,PGM-Index,and XIndex,respectively.
作者
王中华
舒碧华
陈书宁
刘瀚阳
崔秋
万继光
WANG Zhong-hua;SHU Bi-hua;CHEN Shu-ning;LIU Han-yang;CUI Qiu;WAN Ji-guang(Wuhan National Laboratory for Optoelectronics,Huazhong University of Science and Technology,Wuhan 430074,China;PingCAP,Beijing 100192,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2023年第6期1329-1337,共9页
Journal of Chinese Computer Systems
基金
国家自然科学基金面上项目(62072196)资助
深圳市科技计划基础研究面上项目(JCYJ20190809095001781)资助
国家自然科学基金创新研究群体项目(61821003)资助.
关键词
非易失内存
索引结构
学习索引
B+树
键值存储
non-volatile memory
indexing structure
learned index
B+tree
key-value storage