摘要
针对云计算环境下分布式存储系统的数据索引不支持复杂查询的问题,提出了一种多维数据索引机制M-Index,采用金字塔技术(pyramid-technique)将数据的多维元数据描述成一维索引,在此基础上首次提出前缀二叉树(prefix binary tree,PBT)的概念,通过提取一维索引和PBT有效节点的前缀作为数据在存储系统中的主键.数据根据主键和一致性Hash机制发布到存储节点组成的覆盖网络.设计了基于M-Index的数据查询算法,将复杂查询请求转换成一维查询键值,有效支持多维查询和区间查询等复杂查询模式.理论分析和实验表明,M-Index在复杂查询模式下具有良好的查询效率和负载均衡.
Data indexing is one of the most important techniques for distributed storage systems in cloud computing environments since the application data has been partitioned among different storage nodes of the data center. With the rapid development of Web applications, most query requests about metadata information are more complicated. However, the state-of-the-art indexing mechanisms for distributed storage system cannot support complex query, such as multi-dimensional query and range query. To address this issue, we firstly construct the definition of prefix binary tree (PBT) in this paper to support range query process. We then investigate a multi-dimensional indexing for complex query in cloud computing (M-Index) by the combination of pyramid-technique and PBT to transform the multi-dimensional metadata into a single-dimensional key. Data are distributed to overlay networks based on the key and consistent hashing to implement the efficient acquisition and distribution of data. On this basis, we propose a query algorithm based on M-Index which will support multi-dimensional query and range query. Last but not the least, theoretic analysis proves that M-Index possesses fine complex query efficiency as well as completeness of query results. And furthermore, the experiment results demonstrate that our indexing mechanism can outperform the existing relevant mechanisms in query efficiency and load balancing.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2013年第8期1592-1603,共12页
Journal of Computer Research and Development
基金
国家"九七三"重点基础研究发展计划基金项目(2010CB328104)
国家自然科学基金项目(61070161
61202449
61272054
61003257)
国家"八六三"高技术研究发展计划基金项目(2013AA013503)
国家科技支撑计划基金项目(2010BAI88B03
2011BAK21B02)
高等学校博士学科点专项科研基金项目(20110092130002)
国家科技重大专项科研基金项目(2010ZX01044-001-001)
江苏省自然科学基金项目(BK2008030)
江苏省产学研前瞻性联合研究项目(BY2012202)
江苏省科技成果转化专项资金项目(BA2012036)
江苏省网络与信息安全重点实验室资助项目(BM2003201)
教育部计算机网络与信息集成重点实验室(东南大学)资助项目(93K-9)
上海市可扩展计算与系统重点实验室(上海交通大学)资助项目(2010DS680095)
中国教育科研网格ChinaGrid资助项目
关键词
云计算
数据索引
多维查询
区间查询
一致性Hash
cloud computing
data indexing
multi-dimensional query
range query
consistent Hashing