期刊文献+

近数据计算下键值存储中Compaction并行优化方法 被引量:1

Near-Data Processing-Based Parallel Compaction Optimization for Key-Value Stores
下载PDF
导出
摘要 大规模非结构化数据的爆炸式增长给传统关系型数据库带来了极大的挑战.基于日志结构合并树(log-structured merge tree,LSM-tree)的键值存储系统已被广泛应用,并起到重要的作用,原因在于基于LSM-tree的键值存储能够将随机写转化为顺序写,从而提升性能.然而,LSM-tree键值存储也存在一些性能问题.一方面,键值存储利用compaction操作更新数据,保持系统平衡,但造成严重的写放大问题.另一方面,以传统计算为中心的架构下,compaction操作带来大量的数据传输,影响了系统性能.以数据为中心的近数据计算模型(near-data processing,NDP)为基础,利用该模型下主机端与近数据计算使能设备端的并行资源,提出基于系统并行与流水线并行的compaction优化方法(collaborative parallel compaction optimization for LSM-tree key-value stores,CoPro).当处理compaction操作时,CoPro主机端与NDP设备端协同执行compaction卸载任务.此外,进一步提出基于决策组件的CoPro+,根据系统资源变化以及负载键值对中值大小的变化来动态调整并行度,使NDP架构中计算资源的使用更加高效.在搭建的硬件平台上验证了CoPro的有效性. Large-scale unstructured data management brings unprecedented challenges to existing relational databases.The log-structured merge tree(LSM-tree)based key-value store has been widely used and plays an essential role in data-intensive applications.The LSM-tree can convert random-write operations into sequential ones,thereby improving write performance.However,the LSM-tree key-value storage system also has some problems.First,the key-value storage system uses compaction operations to update data to balance system performance,but it impacts system performance and causes serious write amplification.Second,the traditional computing-centric data transmission also limits the overall system performance in compaction.This paper applied the data-centric near-data processing(NDP)model in the storage system.We propose a collaborative parallel compaction optimization for LSM-tree key-value stores named CoPro.The two parallel(i.e.,data and pipeline parallelism)are fully utilized to improve compaction performance.When the compaction is triggered,the host-side CoPro determines the partitioning ratio of the compaction tasks according to the offloading strategy and divides tasks according to the ratio.Then,compaction subtasks are offloaded to the host and device sides,respectively,through the semantic management module.We design a decision component in the host-side and device-side CoPro,which is remarked as CoPro+.CoPro+can dynamically adjust the parallelism according to changes in the resource of system and the value of key-value pairs in workloads.Extensive experimental results validate the benefits of CoPro compared with two popular NDP-based key-value stores.
作者 孙辉 娄本冬 黄建忠 赵雨虹 符松 Sun Hui;Lou Bendong;Huang Jianzhong;Zhao Yuhong;Fu Song(School of Computer Science and Technology,Anhui University,Hefei 230601;Wuhan National Laboratory for Optoelectronics,Huazhong University of Science and Technology,Wuhan 430074;Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093;Department of Computer Science and Engineering,University of North Texas,Denton,TX,USA 76203)
出处 《计算机研究与发展》 EI CSCD 北大核心 2022年第3期597-616,共20页 Journal of Computer Research and Development
基金 安徽高校协同创新项目(GXXT-2019-007) 计算机体系结构国家重点实验室(中国科学院计算技术研究所)开放课题(CARCH201915) 国家自然科学基金项目(62072001,61702004,61572209)。
关键词 日志归并树 键值存储 近数据计算 任务卸载 数据-流水线并行 log-structured merge tree(LSM-tree) key-value store near-data processing(NDP) task offloading data-pipeline parallelism
  • 相关文献

参考文献1

二级参考文献14

  • 1ZHU BIN, WANG ANBAO. The storage technology for GIS data re- alization [J]. Journal of Computers, 2011, 10(6): 2229-2236. 被引量:1
  • 2DOAN A, NAUGHTON J F, BAID A, et al. The case for a struc- tured approach to managing unstructured data [ EB/OL]. [ 2011 - 10- 11 ]. https://database, cs. wisc. edu/cidr/cidr2009/Paper_ 110. pdf. 被引量:1
  • 3ZHANG XIAO, DU XIAO-YONG, CHEN JIN-CHUAN, et al. Managing a large shared bank of unstructured data by using free-ta- ble {G]// APWEB'IO: Proceedings of the 2010 12th International Asia-Pacific Web Conference. Washington, DC: IEEE Computer Society, 2010:441-446. 被引量:1
  • 4VILACA R, OLIVEIRA R. Clouder: a flexible large scale decen- tralized object store: architecture overview [ C]// WDDDM 2009: Proceedings of the Third Workshop on Dependable Distributed Data Management. New York: ACM, 2009:25-28. 被引量:1
  • 5VAHDAT A, AL-FARES M, FARRINGTON N, et al. Scale-out net- working in the data center [J]. IEEE Micro, 2010, 30(4): 29 -41. 被引量:1
  • 6LIN YUNFENG, LIANG BEN, LI BAOCHUN. Priority random lin- ear codes in distributed storage systems [ J]. 1EEE Transactions on Parallel and Distributed Systems, 2009, 20(11): 1653 -1667. 被引量:1
  • 7DORNBACH J, ROEDEL M, KEHR J, et al. Enterprise service o- riented architecture for large file handling with document manage- ment system: US, 7899922[ P]. 2011 -03 - 01. 被引量:1
  • 8ANGSKUN T, FAGG G, BOSILCA G, et al. Self-healing network for scalable fault-tolerant runtime environments [ J]. Future Genera- tion Computer Systems, 2010, 26(3): 479-485. 被引量:1
  • 9GOTTUMUKKALA N R, NASSAR R, PAUN M, et al. Reliability of a system of k nodes for high performance computing applications [ J]. IEEE Transactions on Reliability, 2010, 59(1) : 162 -169. 被引量:1
  • 10CALDERON A, GARCfA-CARBALLEIRA F, SANCHEZ L M, et al. Fault tolerant file models for parallel file systems: Introducing distribution patterns for every file [ J]. The Journal of Supercomput- ing, 2009, 47(3): 312-334. 被引量:1

共引文献49

同被引文献3

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部