摘要
分布式数据库系统和分布式并行计算是"地质云"2.0的关键技术。基于开源的分布式数据库系统HBase、并行计算框架Spark、空间信息服务器GeoServer等技术,完成了地质空间大数据系统设计与实现。本文论述了影响系统查询和计算性能的关键技术,并使用地质空间数据进行了系统实验和测试。实验结果表明,基于开源技术设计的地质空间大数据系统是可行的,比传统技术具有更加高效的性能。HBase分布式存储和空间索引技术显著提高了地质空间大数据的空间查询性能,Spark并行计算技术和多线程技术明显提升了地质空间大数据的计算性能。
The distributed database and parallel computation are key technologies of GeoCloud 2.0.The Geospatial Big Data System(GBDS)is based on the open-source distributed database system HBase,parallel computing architecture Spark and spatial information server GeoServer,and the design and implementation of GBDS is fulfilled.The key technologies are discussed which affect the performance of spatial query and computation.The GBDS is tested with real geological data.The experimental results show that the GBDS is feasible,and its performance is more efficient than traditional technologies.Technologies of distributed storage and spatial index based on HBase are helpful to improve the performance of spatial query for geospatial big data significantly.Spark parallel computing technology and multi-threading technology have notably improved the performance of geospatial big data computing.
作者
齐少凡
于雷易
白明
梅丽斯
王延惠
QI Shaofan;YU Leiyi;BAI Ming;MEI Lisi;WANG Yanhui(Technology Innovation Center of Geological Information,MNR,Beijing 100037,China;…2.Development and Research Center of China Geological Survey,Beijing 100037,China;…3.Teleware Info&Tech Co.,Ltd.,Fuzhou 350001,China)
出处
《国土资源信息化》
2020年第4期16-21,共6页
Land and Resources Informatization
基金
中国地质调查局地质调查专项“国家地质大数据汇聚与管理”项目(DD20190381)。
关键词
地质空间大数据
分布式数据库
空间索引
分布式并行计算
空间信息服务
spatial geological big data
distributed database
spatial index
distributed parallel computation
spatial information service