摘要
针对大数据集下k最近对查询,提出在MapReduce框架下基于R*-tree索引的查询处理技术.先提出在M apReduce框架下快速构建R*-tree索引的方法.在构建索引过程中,采用抽样方法快速确定空间划分函数,保证了将数据对象均匀地划分到各个分区.在已构建的R*-tree索引上,完成k最近对的查询处理.在查询执行过程中,引入基于M BR剪枝规则来过滤不相关对象,从而在很大程度上减少了计算量,提高了查询效率.实验结果表明,该算法具有良好的计算效率和可扩展性,能较好地满足大数据集下k最近对查询请求.
For k-CPQ( Close Pair Query) between large scale datasets,this paper presents an R^*-tree index based approach for processing the queries on MapReduce platform. Before processing the queries,a method for fast building R^*-tree index on MapReduce platform has proposed. During the building of R^*-tree,the space partition functions are determined with a sampling approach which ensures data objects can be evenly divided into different partitions. To improve the query performance,an MBR-based pruning rule is employed to filter the irrevelant data objects before executing the queries. Experimental results demonstrate that the proposed approach of k-CPQ has good efficiency and scalability over large scale datasets.
出处
《小型微型计算机系统》
CSCD
北大核心
2016年第3期483-487,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61003031)资助
上海重点科技攻关项目(14511107902)资助
上海市工程中心建设项目(GCZX14014)资助
上海市一流学科建设项目(XTKX2012)资助
沪江基金研究基地专项(C14001)资助