基于x-tuple的概率阈值top-k查询算法

Probabilistic Threshold top-k Query Algorithm Based on x-tuple

下载PDF

导出

摘要不确定数据库中的概率阈值top-k查询是计算元组排在前k位的概率和,返回概率和不小于p的元组,但现有的查询语义没有将x-tuple内的元组进行整体处理。针对该情况,定义一种新的查询语义——概率阈值x-top-k查询,并给出查询处理算法。在该查询语义下采用动态规划方法求取x-tuple内每个元组排在前k位的概率和,对其进行聚集后做概率阈值top-k查询,并利用观察法、最大上限值等剪枝方法进行优化。实验结果表明,该算法平均扫描全体数据集中60%的数据即可返回正确结果集,证明其查询处理效率较高。 Probabilistic threshold top-k query calculation sum of the probability of the tuple ranked top-k and return the tuples whose sum of the probability are at least p. But top-k query does not take x-tuple as a whole, thus a new top-k query semantic probabilistic threshold x-top-k query is defined and an algorithm is given to process it, which uses dynamic method to acquire sum of the probability of the tuple, then process aggregate probabilities with top-k query. It uses several pruning methods like the upper bound method and so on to optimize the algorithm. Experimental result shows that the algorithm return the answer set for scanning about 60% of data set, and it demonstrates that the algorithm is efficient.

作者黄冬梅舒博王建熊中敏

机构地区上海海洋大学信息学院

出处《计算机工程》 CAS CSCD 2013年第4期44-47,共4页 Computer Engineering

基金国家"973"计划基金资助项目"海量信息可用性基础理论与关键技术研究"(2012CB316200) 南北极环境综合考察与评估专项基金资助项目(CHINARE2012-04-07)

关键词不确定数据库概率阈值top-k查询 x-元组动态规划算法聚集 uncertain database probabilistic threshold top-k query x-tuple dynamic programming algorithm aggregation

分类号 TP393 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献10

1Liu Ling. From Data Privacy to Location Privacy: Models and Algorithms[C]//Proc. of the 33rd Int'l Conf. on Very Large Data Bases. New York, USA: ACM Press, 2007: 1429-1430. 被引量：1
2Yi Ke, Li Feifei, Srivastava D, et al. Efficient Processing of Top-k Queries in Uncertain Databases with X- relations[J]. IEEE Trans. on Knowledge and Data Engineering, 2008, 20(12): 1669-1682. 被引量：1
3Soliman M A, Ilyas I F, Chang K C C. Probabilistic Top-k and Ranking-aggregate Queries[J]. ACM Trans. on Database System, 2008, 33(3): 131-136. 被引量：1
4Soliman M A, Ilyas I F. Top-k Query Processing in Uncertain Database[C]//Proc. of ICDE'07. [S. 1.]: IEEE Press, 2007. 被引量：1
5Cheng R. Evaluating Probabilistic Queries over Imprecise Data[C]//Proc. of SIGMOD'03. New York, USA: ACM Press, 2003. 被引量：1
6Ilyas I F, Beskales G, Soliman M. A Survey of Top-k Query Processing Techniques in Relational Database Systems[J]. ACM Computing Surveys, 2008, 40(4): 1-58. 被引量：1
7周傲英,金澈清,王国仁,李建中.不确定性数据管理技术研究综述[J].计算机学报,2009,32(1):1-16. 被引量：185
8Agrawal P, Benjelloun O, Das S A, et al. Trio: A System for Data Uncertainty and Linage[C]//Proc. of the 32nd Int'l Conf. on Very Large Data Bases. [S. 1.]: ACM Press, 2006. 被引量：1
9刘德喜,万常选,刘喜平.不确定数据库中基于x-tuple的高效Top-k查询处理算法[J].计算机研究与发展,2010,47(8):1415-1423. 被引量：4
10Hua Ming, Pei Jian, Zhang Wenjie, et al. Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach[C]// Proc. of SIGMOD'08. New York, USA: ACM Press, 2008: 9-12. 被引量：1

二级参考文献115

1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量：161
2谷峪,于戈,张天成.RFID复杂事件处理技术[J].计算机科学与探索,2007,1(3):255-267. 被引量：54
3崔逊学,方红雨,朱徐来.传感器网络定位问题的概率特征[J].计算机研究与发展,2007,44(4):630-635. 被引量：14
4Deshpande A, Guestrin C, Madden S, Hellerstein J M, Hong W. Model-driven data acquisition in sensor networks// Proceedings of the 30th International Conference on Very Large Data Bases. Toronto, 2004:588-599 被引量：1
5Madhavan J, Cohen S, Xin D, Halevy A, Jeffery S, Ko D, Yu C. Web-scale data integration: You can afford to pay as you go//Proceedings of the 33rd Biennial Conference on Innovative Data Systems Research. Asilomar, 2007:342-350 被引量：1
6Liu Ling. From data privacy to location privacy: Models and algorithms (tutorial)//Proceedings of the 33rd International Conference on Very Large Data bases. Vienna, 2007: 1429- 1430 被引量：1
7Samarati P, Sweeney L. Generalizing data to provide anonymity when disclosing information (abstract)//Proeeedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Seattle, 1998:188 被引量：1
8Cavallo R, Pittarelli M. The theory of probabilistic databases//Proceedings of the 13th International Conference on Very Large Data Bases. Brighton, 1987:71-81 被引量：1
9Barbara D, Garcia-Molina H, Porter D. The management of probabilistic data. IEEE Transactions on Knowledge and Data Engineering, 1992, 4(5): 487-502 被引量：1
10Fuhr N, Rolleke T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems, 1997, 15(1): 32-66 被引量：1

共引文献186

1刘正伟,文中领,张海涛.云计算和云数据管理技术[J].计算机研究与发展,2012,49(S1):26-31. 被引量：170
2刘殷雷,刘玉葆,陈程.不确定性数据流上频繁项集挖掘的有效算法[J].计算机研究与发展,2011,48(S3):1-7. 被引量：14
3何明,李薇.基于概率信息抽取模型的Top-k查询[J].计算机研究与发展,2011,48(S3):224-231.
4杜凌霞,李翠平,陈红,张应龙.概率图上的对象相似度计算[J].计算机研究与发展,2011,48(S3):326-333. 被引量：1
5叶杰敏,刘国华,貟慧,石丹妮,吴云龙,费凡.Attribute-or模型下不确定关系的无损分解算法[J].计算机研究与发展,2013,50(S1):117-124. 被引量：1
6于洋,赵志滨,鲍玉斌,于戈.面向属性级不确定数据的U-Topk查询优化算法的研究[J].计算机研究与发展,2013,50(S1):125-132.
7梁俊杰,熊亚军.以固态硬盘为缓存的存储技术研究[J].微电子学与计算机,2015,32(1):40-44. 被引量：2
8岳昆,刘惟一,周丽萍.EQPN:数据中不确定性知识的定性表示及推理[J].云南大学学报（自然科学版）,2010,32(S1):340-344.
9张硕,高宏,李建中,邹兆年.不确定图数据库中高效查询处理[J].计算机学报,2009,32(10):2066-2079. 被引量：24
10岳昆,刘惟一.不确定性知识的定性表示、推理及其应用——定性概率网研究综述[J].云南大学学报（自然科学版）,2009,31(6):560-570. 被引量：5

1陈伍军,丁剑,曾庆凯.基于unix系统的不确定数据库解决方案[J].计算机工程与应用,2004,40(2):186-189. 被引量：4
2陈凤娟.不确定数据库频繁项集挖掘算法研究[J].绥化学院学报,2016,36(5):149-151. 被引量：2
3李海昆.探究基于MapReduce的top-k查询算法[J].信息通信,2015,28(9):12-13.
4计算机软件[J].中国学术期刊文摘,2007,13(14):207-210.
5陈凤娟.不确定数据中的频繁项集挖掘[J].洛阳师范学院学报,2016,35(2):26-28. 被引量：1
6周帆,李树全,肖春静,吴跃.不确定数据库中概率top-k和排序查询算法[J].计算机应用,2010,30(10):2605-2609. 被引量：3
7张徵,杨卫东,朱皓.不确定数据库上的top-k关键字查询[J].计算机科学与探索,2011,5(9):781-790. 被引量：3
8刘德喜,万常选,刘喜平.不确定数据库中基于x-tuple的高效Top-k查询处理算法[J].计算机研究与发展,2010,47(8):1415-1423. 被引量：4
9魏菊梅,段振辉,马睿.含不确定项中立系统鲁棒稳定[J].郑州大学学报（理学版）,2008,40(4):20-23.
10段振辉,王秀梅.含时变时滞和不确定项中立系统鲁棒稳定[J].河南机电高等专科学校学报,2008,16(3):116-118. 被引量：3

计算机工程

2013年第4期

浏览历史

内容加载中请稍等...

基于x-tuple的概率阈值top-k查询算法

参考文献10

二级参考文献115

共引文献186

相关作者

相关机构

相关主题

浏览历史