期刊文献+

TKEP:海量数据上一种有效的Top-K查询处理算法 被引量:16

TKEP:An Efficient Top-K Query Processing Algorithm on Massive Data
下载PDF
导出
摘要 在许多应用领域中,top-k查询是一种十分重要的操作,它根据给定的评分函数在潜在的巨大的数据空间中返回k个最重要的对象.不同于传统的TA算法,NRA算法只需要顺序读就可以处理top-k查询,从而适合于随机读受限或不可能的场合.文中详细地分析了NRA算法的执行行为,确定了增长阶段和收缩阶段中每个文件需要扫描的元组个数.文中发现在海量数据环境中,NRA在增长阶段需要维护大量的候选元组,严重影响了算法的执行效率.所以,文中提出一种新的海量数据上的top-k查询算法TKEP,该算法在查询的增长阶段就执行早剪切,从而大大减少增长阶段需要维护的候选元组.文中给出了早剪切操作的数学分析,确定了早剪切操作的理论和实际剪切效果.据作者所知,该文是第一篇提出在top-k查询的增长阶段执行早剪切的文章.实验结果表明,和传统的NRA相比,TKEP在增长阶段维护的元组数量减少3个数量级,需要的内存量减少1个数量级,TKEP算法获得1个数量级的加速比. In many application fields,top-k is an important operation since it returns k most important objects according to a given ranking function.Different from traditional TA algorithms,NRA only requires sequential access to return top-k results so that it can be used in environment where random access is limited or impossible.This paper analyzes the execution behavior of NRA and determines tuple number to scan in increasing and shrinking phase.It is found that in massive data context,NRA needs to maintain large quantity of candidate tuples in increasing phase which affects algorithm efficiency significantly.This paper proposes a novel top-k algorithm TKEP(Top-K with Early Pruning) on massive data which performs early pruning in increasing phase to prune most of candidate tuples.This paper provides mathematical analysis of early pruning and proves its theoretical and practical pruning effect.To the best of our knowledge,it is the first paper to provide early pruning in top-k processing.The extensive experiments show that compared to NRA,TKEP maintains less tuples by a factor of three orders of magnitude,it consumes less memory by a factor of an order of magnitude and TKEP achieves substantial performance speed-up of an order of magnitude.
出处 《计算机学报》 EI CSCD 北大核心 2010年第8期1405-1417,共13页 Chinese Journal of Computers
基金 国家"九七三"重点基础研究发展规划项目基金(2006CB303005) 国家自然科学基金(60903016 60533110 60773063) 新世纪优秀人才支持计划(NCET-05-0333) 黑龙江省教育厅科学技术研究项目(11531276) NSFC-RGC of China(60831160525)资助~~
关键词 海量数据 TOP-K 早剪切 TKEP massive data top-k early pruning Top-K with Early Pruning
  • 相关文献

参考文献22

  • 1Korn Flip,Pagel Bernd-Uwe,Faloutsos Christos.On the ‘Dimensionality Curse' and the ‘Self-Similarity Blessing'.IEEE Transactions on Knowledge and Data Engineering,2001,13(1):96-111. 被引量:1
  • 2Fagin Ronald,Lotem Amnon,Naor Moni.Optimal aggregation algorithms for middleware//Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems(PODS'01).California,USA,2001:102-113. 被引量:1
  • 3Fagin Ronald,Lotem Amnon,Naor Moni.Optimal aggregation algorithms for middleware.Journal of Computer and System Sciences,2003,66(4):614-656. 被引量:1
  • 4Mamoulis Nikos,Cheng Kit Hung,Yiu Man Lung,Cheung David W.Efficient aggregation of ranked inputs//Proceedings of the 22nd International Conference on Data Engineering(ICDE'06).Atlanta,GA,USA,2006:72-83. 被引量:1
  • 5Mamoulis Nikos,Yiu Man Lung,Cheng Kit Hung,Cheung David W.Efficient top-k aggregation of ranked inputs.ACM Transactions on Database Systems(TODS),2007,32(3):19. 被引量:1
  • 6Pang HweeHwa,Ding Xuhua,Zheng Baihua.Efficient processing of exact top-k queries over disk-resident sorted lists.VLDB Journal,2010,19(3):437-456. 被引量:1
  • 7Fagin Ronald,Kumar Ravi,Sivakumar D.Efficient similarity search and classification via rank aggregation//Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD'03).San Diego,California,USA,2003:301-312. 被引量:1
  • 8Bloom Burton H.Space/time trade-offs in Hash coding with allowable errors.Communications of the ACM,1970,13(7):422-426. 被引量:1
  • 9Ilyas Ihab F,Beskales George,Soliman Mohamed A.A survey of top-k query processing techniques in relational database systems.ACM Computing Surveys,2008,40(4):11. 被引量:1
  • 10Bruno Nicolas,Chaudhuri Surajit,Gravano Luis.Top-k selection queries over relational databases:Mapping strategies and performance evaluation.ACM Transactions on Database Systems(TODS),2002,27(2):153-187. 被引量:1

同被引文献132

引证文献16

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部