摘要
目前不确定XML的Top-k关键字查询仅返回概率值排在前k的根节点,需要进一步的处理才能构建满足特定条件下的子树,效率低下.针对这一问题,定义了一种新的基于最小相关联通子树的Top-k查询语义SRCT-Top-k(smallest related connected subtree Top-k),SRCT-Top-k查询返回概率值排在前k的最小相关联通子树,并提出基于动态Keyw ord数据仓的Pr ListTop-k算法来处理SRCT-Top-k查询.Pr List Top-k算法仅扫描一次动态Keyw ord数据仓就能构建满足特定条件下的子树,并制定了过滤策略减少了中间结果.理论分析和实验结果表明,Pr List Top-k是一种高效的不确定XML的Top-k查询算法.
Exiting algorithms of Top-k keyword search over uncertain xml just return root nodes with the k highest probabilistic existence, and they have to construct subtree results that meet some certain conditions, which are inefficient in practice. To solve this problem,this paper defined a novel Top-k query semantics over uncertain xml named SRCT-Top-k based on smallest related connected subtree, which returned the smallest related connected subtree with the k highest probabilistic existence, and proposed an algorithm named PrListTop-k based on the dynamic keyword data repository. PrListTop-k found the subtree results meeting some certain conditions by scanning the dynamic keyword data repository only once , and developed filtering strategies to reduce the number of intermediate results. The theoretical analysis and the results of experiment show that PrListTop-k2 is an efficient Top-k keyword search algorithm over uncertain XML.
出处
《小型微型计算机系统》
CSCD
北大核心
2014年第12期2691-2696,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61163015)资助
内蒙古自然科学基金重点项目(2013MS0909)资助