摘要
海洋温度数据在全球海洋观测和气候研究中发挥着关键作用,质量控制对于确保这些数据的可靠性十分关键,然而,目前在大数据集上的异常数据召回率尚不理想。文章基于Argo温度数据,提出一种基于规则集和多层感知机(rule set and multilayer perceptron,RS-MLP)的质量控制方法。首先对13种机器学习模型进行对比分析,从中筛选出最优机器学习模型,然后设计了由6种基于规则的质量控制检查模块组成的规则集,最后集成规则集和最优机器学习模型构建出RS-MLP方法,并以南海区域的Argo数据为例评估方法性能。研究结果表明:RS-MLP在351746条温度数据的测试集中真阳性率(true positive rate,TPR)、真阴性率(true negative rate,TNR)和接受者操作特性(receiver operating characteristic,ROC)曲线下面积(area under the curve,AUC)依次能达到93%、96%和95%,并在不同深度层次上的异常数据召回率比较稳定,具有优秀的质量控制性能。
The ocean temperature data plays a crucial role in global ocean observation and climate research.Quality control is essential to ensure the reliability of these data.However,the current recall rate of anomalous data in large datasets is unsatisfactory.This paper proposes a quality control method based on a rule set and multilayer perceptron(RS-MLP),using Argo temperature data.Initially,thirteen machine learning models are compared and analyzed to select the optimal model.Subsequently,a rule set consisting of six rule-based quality control check modules is designed.Finally,the RS-MLP method is constructed by integrating the rule set with the optimal machine learning model,and its performance is evaluated using Argo data from the South China Sea region.The results show that the RS-MLP achieves good performance with true positive rate(TPR),true negative rate(TNR),and area under the receiver operating characteristic(ROC)curve(AUC)reaching 94%,96%,and 95%respectively in a test set of 351746 temperature data points.The recall rate of anomalous data at different depth levels is stable,demonstrating excellent quality control performance.
作者
齐焕东
朱程
李序春
景昕蒂
宋德瑞
QI Huandong;ZHU Cheng;LI Xuchun;JING Xindi;SONG Derui(College of Information Technology(Shanghai Ocean University),Shanghai 201306,China;National Marine Environmental Monitoring Center,Dalian 116023,China;School of Geographical Sciences(Liaoning Normal University),Dalian 116029,China)
出处
《热带海洋学报》
CAS
CSCD
北大核心
2024年第5期190-202,共13页
Journal of Tropical Oceanography
基金
国家重点研发计划项目(2021YFF0704000、2022YFC3106100)。
关键词
ARGO
温度
机器学习
质量控制
Argo
temperature
machine learning
quality control