摘要
空间文本数据流上连续k近邻查询(Continuous k-nearest neighbor Queries over Spatial-Textual data streams,CkQST)能在空间文本对象组成的数据流上检索并实时更新k个包含指定关键字的空间邻近对象,是空间文本数据流上连续查询(Continuous Queries over Spatial-Textual data streams,CQST)的一种,以预订(subscribe)的方式广泛应用于广告定位、微博分析、地图导航等领域.求解CkQST采用CQST的求解框架——构建空间文本混合索引组织查询,利用索引的空间过滤和文本过滤能力,为不断到来的对象匹配查询.该框架的求解效率取决于索引的过滤能力,提高索引过滤能力的主要途径是将查询的空间搜索范围映射到索引结构的最小区域,减少需要验证的查询数量.这一途径适用于查询空间搜索范围很少变化的情况.对于CkQST,覆盖k个最邻近对象的空间范围随着符合文本匹配条件的对象的数量的变化而变化,与之对应的索引项需要同步更新,代价高.针对这一问题,本文选择能够高效支持空间范围变化的Quad-tree和关键字查找的倒排索引,构成空间文本混合索引,组织CkQST.在空间过滤方面,提出内存代价模型VUMBCM(Verification and Update of Memory-Based Cost Model,VUMBCM),通过平衡索引更新代价和验证代价,优化查询空间搜索范围到Quad-tree节点的映射.在文本过滤方面,采用基于块的有序倒排索引,组织Quad-tree节点内的查询,以快速定位需要验证的查询,避免对倒排列表中大量不可能匹配查询的访问;批量处理包含共同文本项的对象,提高文本验证时的对象吞吐量.由此构建的混合索引,称为OIQ-tree.实验表明,OIQ-tree中的代价模型及基于块的有序倒排索引能够支持CkQST的高效求解.与目前先进的索引技术相比,当查询规模达到2000万时,因数据流中对象的变化导致的索引平均更新时间降低了46%,数据流中对象的平均处理时间降低了22%.
The continuous k nearest neighbor queries over spatial textual data streams(CkQST for short)retrieve and continuously monitor at most k nearest neighbor objects to the user specified location containing all the user specified keywords over the data streams composed of spatial textual objects,which is a type of continuous queries over spatial textual data streams and has been widely used in a wide variety of location based applications,such as location aware targeting of advertisements,analysis of microblogs and mobile navigation services etc.by way of subscriptions.Evaluating CkQST utilizes the solution framework of evaluating the generic continuous queries over spatial textual data streams,i.e.selecting a spatial index and a textual index to form a hybrid spatial textual index to organize the queries,and matching the incoming objects continuously generated utilizing the spatial and textual filtering capabilities of the index.In this framework,the evaluation efficiency depends on the filtering ability of the index,and the major approach to improving the filtering ability of the index is to map the spatial search range of the queries to the smallest area of the index structure to reduce the number of queries being verified by the objects over data streams,which is suitable for the situations where the search range of the queries rarely changes.For CkQST,the spatial range covering k nearest neighbor qualified objects frequently changes with the number of the objects containing all the query keywords,and accordingly the index should be updated synchronously,which requires very expensive cost.To solve the problem,this paper selects the Quad-tree integrated with an inverted index to construct a hybrid spatial textual index to organize CkQST,where the Quad-tree can efficiently support the frequent change of the spatial range of CkQST,and the inverted index can efficiently support the keyword query.With respect to the spatial filtering,a memory based cost model VUMBCM(Verification and Update of Memory Based Cost Model)i
作者
杨茸
牛保宁
YANG Rong;NIU Bao-Ning(College of Information and Computer,Taiyuan University of Technology,Jinzhong,Shanxi 030600)
出处
《计算机学报》
EI
CAS
CSCD
北大核心
2021年第8期1732-1750,共19页
Chinese Journal of Computers
基金
国家自然科学基金(61572345)资助。
关键词
空间文本查询
数据流
空间文本索引
K近邻
连续查询
spatial-textual query
data streams
spatial-textual index
k nearest neighbor
continuous query