期刊文献+

基于实时数据和历史查询分布的时空索引新方法 被引量:2

New spatio-temporal index method based on real-time data and query log distribution
下载PDF
导出
摘要 在大数据时代,数据具有体量大、时空复杂性明显、对实时性要求较高等特点,而传统基于树形结构对大规模时空数据进行索引的方法存在存储空间浪费和查询效率较低的问题。为了解决该问题,提出了一种基于数据和历史查询记录分布建立时空索引的新方法 HDL-index。该算法一方面根据数据在空间上的分布,通过空间划分的思想建立索引网格;另一方面考虑到查询在时间上的延续性,对查询记录对象进行密度聚类后抽象出查询代表模型,然后根据模型的坐标位置和其查询粒度对整体查询区域进行分割。两部分所得到的索引网格都采用Geohash编码,最终合并得到最优的索引编码。HDL-index在考虑数据分布的同时充分考虑用户查询行为,使得频繁查询区域上的索引更加细化。在真实航空数据集上与同类方法进行比较测试的结果表明,其创建索引的效率提高了50%;同时在数据均匀分布的情况下对热点区域的查询效率可提高75%以上。 In the era of large data, the data has the characteristics of large volume, obvious spatio-temporal complexity, high real-time requirement, and etc. However, the traditional method of indexing large-scale spatio-temporal data based on tree structure has the problem of low utilization of storage space and low efficiency of query. In order to solve this problem, a new method named HDL-index was proposed to establish the spatio-temporal index based on the distribution of data and historical query records. On the one hand, the whole area was partitioned based on the spatial distribution of the data. On the other hand, taking into account the continuity of query, the query-models were obtained after density-based clustering on historical query objects, and then based on the model coordinates and query granularity of the overall query area segmentation, the two indexes were merged based on their GeoHash codes, and finally the optimal index coding was obtained. HDL-index takes better account of the data distribution and users' queries, making the index on the frequent query area more refined. Compared with the efficiency of the similar method, the efficiency of the index creation is improved by 50%, and the query efficiency of the hotspot region can be increased by more than 75% when the data is evenly distributed in the real aeronautical data set.
出处 《计算机应用》 CSCD 北大核心 2017年第3期860-865,共6页 journal of Computer Applications
基金 国家自然科学基金资助项目(61502106) 福建省区域重大科技专项资助项目(2014H4015)~~
关键词 时空索引 大数据 GeoHash编码 密度聚类 热点区域查询 spatio-temporal index big data GeoHash encoding density clustering hotspot region query
  • 相关文献

参考文献3

二级参考文献54

  • 1兰小机,刘德儿,闾国年.GML空间数据索引机制研究[J].计算机工程,2007,33(6):92-94. 被引量:8
  • 2CHODOROW K,DIROLF M MongoDB 权威指南[M].北京:人民邮电出版社,2011.76-78. 被引量:2
  • 3DIMIDUK N,KHURANA A.HBase in Action[M],New York:Manning Publications Co,2012.203-235. 被引量:1
  • 4The Apache Software Foundation.Apache Lucene 4.2.0 Docu-mentation[EB/OL].http://lucene.apache,org/core/4_2_0/in-dex.html,2013-03-11. 被引量:1
  • 5Tlie Apache Software Foundation* Apache Solr 4.2.0 Documentation [EB/OL].http://lucene.apache,org/solr/4_2_0/,2013-03-13. 被引量:1
  • 6NIEMEYER G.Geohash Tips Tricks[EB/OL].http://geohash.org/site/tips,html,2013-03-21. 被引量:1
  • 7刘润涛.基于序的空间数据索引及查询算法研究[D].哈尔滨理工大学,2009.2-3. 被引量:1
  • 8OGC.OpenGIS Geography Markup Language(GML)Im-plementation Specification(Version 3.1.1). ht-tp://www.opengeospatial.org/standards/gml . 2004 被引量:1
  • 9OGC.OpenGIS Geography Markup Language(GML)Im-plementation Specification(Version 2.1.1). ht-tp://www.opengeospatial.org/standards/gml . 2002 被引量:1
  • 10W3C.eXtensible Markup Language(XML)1.0(FourthEdition). http://www.w3.org/TR/2006/REC-xml-20060816 . 2006 被引量:1

共引文献64

同被引文献10

引证文献2

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部