Many key-value stores use RDMA to optimize the messaging and data transmission between application layer and the storage layer,most of which only provide point-wise operations.Skiplist-based store can support both poi...Many key-value stores use RDMA to optimize the messaging and data transmission between application layer and the storage layer,most of which only provide point-wise operations.Skiplist-based store can support both point operations and range queries,but its CPU-intensive access operations combined with the high-speed network will easily lead to the storage layer reaches CPU bottlenecks.The common solution to this problem is offloading some operations into the application layer and using RDMA bypassing CPU to directly perform remote access,but this method is only used in the hash tablebased store.In this paper,we present RS-store,a skiplist-based key-value store with RDMA,which can overcome the CPU handle of the storage layer by enabling two access modes:local access and remote access.In RS-store,we redesign a novel data structure R-skiplist to save the communication cost in remote access,and implement a latch-free concurrency control mechanism to ensure all the concurrency during two access modes.RS-store also supports client-active range query which can reduce the storage layer’s CPU consumption.At last,we evaluate RS-store on an RDMA-capable cluster.Experimental results show that RS-store achieves up to 2x improvements over RDMA-enabled RocksDB on the throughput and application’s scalability.展开更多
基金This work was supported by Youth Program of National Science Foundation of China(61702189).
文摘Many key-value stores use RDMA to optimize the messaging and data transmission between application layer and the storage layer,most of which only provide point-wise operations.Skiplist-based store can support both point operations and range queries,but its CPU-intensive access operations combined with the high-speed network will easily lead to the storage layer reaches CPU bottlenecks.The common solution to this problem is offloading some operations into the application layer and using RDMA bypassing CPU to directly perform remote access,but this method is only used in the hash tablebased store.In this paper,we present RS-store,a skiplist-based key-value store with RDMA,which can overcome the CPU handle of the storage layer by enabling two access modes:local access and remote access.In RS-store,we redesign a novel data structure R-skiplist to save the communication cost in remote access,and implement a latch-free concurrency control mechanism to ensure all the concurrency during two access modes.RS-store also supports client-active range query which can reduce the storage layer’s CPU consumption.At last,we evaluate RS-store on an RDMA-capable cluster.Experimental results show that RS-store achieves up to 2x improvements over RDMA-enabled RocksDB on the throughput and application’s scalability.
文摘持久性内存(persistent memory,PMEM)同时具备内存的低时延字节寻址和磁盘的持久化特性,将对现有软件架构体系产生革命性的变化和深远的影响.分布式存储在云计算和数据中心得到了广泛的应用,然而现有的以Ceph BlueStore为代表的后端存储引擎是面向传统机械盘和固态硬盘(solid state disk,SSD)设计的,其原有的优化设计机制不适合PMEM特性优势的发挥.提出了一种基于持久性内存和SSD的后端存储MixStore,通过易失区段标记和待删除列表技术实现了适用于持久性内存的并发跳表,用于替代RocksDB实现元数据管理机制,在保证事务一致性的同时,消除了BlueStore的compaction所引发的性能抖动等问题,同时提升元数据的并发访问性能;通过结合元数据管理机制的数据对象存储优化设计,把非对齐的小数据对象存放在PMEM中,把对齐的大块数据对象存储在SSD上,充分发挥了PMEM的字节寻址、持久性特性和SSD的大容量低成本优势,并结合延迟写入和CoW(copy-on-write)技术实现数据更新策略优化,消除了BlueStore的WAL日志引起的写放大,提升小数据写入性能.测试结果表明,在同样的硬件环境下,相比BlueStore,MixStore的写吞吐提升59%,写时延降低了37%,有效地提升了系统的性能.