摘要
不确定数据库中的概率阈值top-k查询是计算元组排在前k位的概率和,返回概率和不小于p的元组,但现有的查询语义没有将x-tuple内的元组进行整体处理。针对该情况,定义一种新的查询语义——概率阈值x-top-k查询,并给出查询处理算法。在该查询语义下采用动态规划方法求取x-tuple内每个元组排在前k位的概率和,对其进行聚集后做概率阈值top-k查询,并利用观察法、最大上限值等剪枝方法进行优化。实验结果表明,该算法平均扫描全体数据集中60%的数据即可返回正确结果集,证明其查询处理效率较高。
Probabilistic threshold top-k query calculation sum of the probability of the tuple ranked top-k and return the tuples whose sum of the probability are at least p. But top-k query does not take x-tuple as a whole, thus a new top-k query semantic probabilistic threshold x-top-k query is defined and an algorithm is given to process it, which uses dynamic method to acquire sum of the probability of the tuple, then process aggregate probabilities with top-k query. It uses several pruning methods like the upper bound method and so on to optimize the algorithm. Experimental result shows that the algorithm return the answer set for scanning about 60% of data set, and it demonstrates that the algorithm is efficient.
出处
《计算机工程》
CAS
CSCD
2013年第4期44-47,共4页
Computer Engineering
基金
国家"973"计划基金资助项目"海量信息可用性基础理论与关键技术研究"(2012CB316200)
南北极环境综合考察与评估专项基金资助项目(CHINARE2012-04-07)