摘要
针对Top-k dominating查询算法需要较高的时空消耗来构建属性组合索引,并且在相同属性值较多情况下的查询结果准确率低等问题,提出一种通过B+-trees和概率分布模型相结合的子空间支配查询算法——Ranking-k算法。首先,采用B+-trees为待查找数据各属性构建有序列表;然后,采取轮询调度算法读取skyline准则涉及到的有序列表,生成候选元组并获得k组终结元组;其次,根据生成的候选元组和终结元组,采用概率分布模型计算终结元组支配分数。迭代上述过程优化查询结果,直到满足条件为止。实验结果表明:Ranking-k与基本扫描算法(BSA)相比,查询效率提高了94.43%;与差分算法(DA)相比,查询效率提高了7.63%;与早剪枝Top-k支配(TDEP)算法、BSA和DA相比,查询结果更接近理论值。
Top-k dominating query algorithm requires high consumption of time and space to build combined indexes on the attributes, and the query accuracy is low for the data with same attribute values. To solve these problems, a Ranking-k algorithm was given in this paper. The proposed Ranking-k algorithm is a new subspace dominating query algorithm combining the B+-trees with probability distribution model. Firstly, the ordered lists for each data attribute were constructed by the B+-trees. Secondly, the round-robin scheduling algorithm was used to scan ordered attribute lists satisfying skyline criterion.Some candidate tuples were generated and k end tuples were obtained. Thirdly, the dominated scores of end tuples were calculated by using the probability distribution model according to the generated candidate tuples and end tuples. Through iterating the above process, the optimal query results were obtained. The experimental results show that the overall query efficiency of the proposed Ranking-k algorithm is improved by 94. 43% compared with the Basic-Scan Algorithm( BSA) and by7. 63% compared with the Differential Algorithm( DA), and the query results of Ranking-k algorithm are much closer to theoretical values in comparison of the Top-k Dominating with Early Pruning( TDEP) algorithm, BSA and DA.
出处
《计算机应用》
CSCD
北大核心
2015年第1期108-114,共7页
journal of Computer Applications
基金
国家自然科学基金资助项目(61303127)
国家科技支撑计划项目(2013BAH32F02
2013BAH32F03)
国防重点学科实验室项目(13zxnk12)
四川省教育厅重点项目(13ZA0169)
四川省苗子工程资助项目(2014-043)
西南科技大学研究生创新基金资助项目(14ycx057)