摘要
地学时序大数据具有多传感器、多目标、多分辨率、多类型的多源异构特征,是地学领域机器学习与数据挖掘的重要数据来源,分为基于时点和基于时段的两大类时序数据。现有时序数据的相似性度量与索引研究主要聚焦在前者。时序数据表达方法的核心思想是降维处理,是相似性度量与索引方法的基础,主要包括基于域变换和模型的表达方法和基于极限分段思想的表达方法。相似性度量的核心是相似性距离计算,主要分为锁步度量和弹性度量。它为时序数据索引中索引项的聚合与划分提供了基本准则。多源异构地学时序大数据的高效相似性度量与分布式索引方法是地学大数据领域未来的重要研究方向。
Geoscience time series big data is a kind of multi-sensor,multi-target,multi-resolution,multi-type,and multi-source heterogeneous time data,and is an important data source for machine learning and data mining in the field of geosciences.Geosciences time series data has two categories:the time point data set,and the time interval data set.The main data representation methods,similarity measurements and data indexing methods of existing time series data focus on time-points-based time series data.The core idea of the representation method for time series data is dimensionality reduction.It is the basis of similarity measurement and indexing method,including domain-transformation-based,model-based and pieces-based representation methods.The key of similarity measurement is the similarity distance including lock-step measurement and elasticity measurement.It provides a basic guideline for the aggregation and division of index items in the index of time series data.The efficient similarity measurement and distributed indexing method of multi-source heterogeneous time series big data will be an important further direction in the field of geosciences big data.
作者
何珍文
吴冲龙
刘刚
田宜平
张夏林
陈麒玉
He Zhenwen;Wu Chonglong;Liu Gang;Tian Yiping;Zhang Xialin;Chen Qiyu(School of Computer Science,China University of Geosciences(Wuhan),Wuhan 430078,China;Hubei Key Laboratory of Intelligent Geo-Information Processing,China University of Geosciences(Wuhan),Wuhan 430078,China;Technology Innovation Center of Mineral Resources Explorations in Bedrock Zones,Ministry of Natural Resources,Guiyang 550081,China)
出处
《地质科技通报》
CAS
CSCD
北大核心
2020年第4期44-50,共7页
Bulletin of Geological Science and Technology
基金
国家自然科学基金项目(41972306,41572314,U1711267)
贵州省科技计划(黔科合支撑[2017]2951,黔科合支撑[2019]2868,黔科合支撑[2020]4Y039号,黔科合平台[2018]5618)
贵州省地质勘查基金项目(2019-02号)
贵州省地矿局科研项目(黔地矿科合[2017]2,黔地矿科合[2018]07)
湖北省创新群体项目(2019CFA023)。
关键词
时序数据
大数据
表达
索引
相似性度量
time series data
big data
representation
index
similarity measurement