摘要
空气中的PM_(2.5)是威胁人体健康的主要大气污染物,对其进行有效预测和及时预警具有重要意义.大量研究表明,纳入周边站点信息的随机森林模型在单站点PM_(2.5)预测中显示出良好的效果,但在周边站点选取问题上目前尚缺乏针对性研究,部分选取方法带有主观性.本文提出了一种基于时间滞后互相关分析的周边站点优化选取方法,并以上海十五厂空气质量监测站(国控站)为例,构建了预测该站未来1~24 h PM_(2.5)浓度的随机森林回归模型集,比较分析了预测模型中各输入因子的重要性.研究发现,预测站点当前PM_(2.5)浓度值对未来1~16 h的预测最为重要,而气象要素中的风向则对于未来17~24 h的预测重要性最高;周边站点PM_(2.5)信息随着预测时间的延长,其重要程度排名有明显提升,且不同站点对不同时间预测的影响具有显著差异,在建模时应区别对待,优化选取.比较结果表明,使用本文方法选取周边站点建立的预测模型不仅在RMSE等精度指标上具有一定优势(12 h和24 h预报RMSE分别降低11.8%和13.3%),还在有实用价值的污染事件空报率上有明显降低(12 h和24 h预报空报率分别降低16.1%和25.6%),具有业务应用潜力.
PM_(2.5) is a major air pollutant that threatens human health, and it is significant to be effectively predicted and promptly warned. Many studies have shown that the Random Forest model(RF) has good results in the prediction of PM_(2.5) concentration at a single station by incorporating the information of surrounding stations. However, the research on the selection of surrounding stations is lack of pertinence, and some existing selection methods are subjective. We proposed a method for optimizing the selection of surrounding stations based on Time-Lag Cross-Correlation(TLCC) analysis in this research. Taking the air quality monitoring station(national-level station) of Shanghai Shiwuchang as an example, a set of RF regression models were constructed to predict the PM_(2.5) concentration of the station in the next 1 to 24 hours, and the importance of each input factor in the prediction model was compared and analyzed. We found that the current PM_(2.5) concentration of the prediction station would significantly impact the prediction of the next 1 to 16 hours, while the wind direction was crucial for the prediction of the next 17 to 24 hours. As the forecast time increased, PM_(2.5) concentration of the surrounding stations significantly improved in importance ranking, and the impact of different stations was significantly different when forecasting at different times. Therefore it was treated differently when modeling. The comparison results showed that the prediction model established by the method of selected surrounding stations proposed in this paper not only had certain advantages in accuracy(12-hour and 24-hour forecast RMSE decreased by 11.8% and 13.3%), but the false alarm ratio also decreased significantly(the forecasted false alarm ratio for 12 hours and 24 hours dropped by 16.1% and 25.6%). The study has practical value and potential applications in predicting and prewarning air pollution.
作者
姚红岩
施润和
YAO Hongyan;SHI Runhe(Key Laboratory of Geographical Information Science,Ministry of Education,East China Normal University,Shanghai 200241;School of Geographic Sciences,East China Normal University,Shanghai 200241;Joint Laboratory for Environmental Remote Sensing and Data Assimilation,East China Normal University,Shanghai 200241;Joint Research Institute of Resources and Environment,East China Normal University,Shanghai 200062;Institute of Eco-Chongming,East China Normal University,Shanghai 202162)
出处
《环境科学学报》
CAS
CSCD
北大核心
2021年第4期1565-1573,共9页
Acta Scientiae Circumstantiae
基金
国家重点研发计划项目(No.2016YFC1302602)
教育部哲学社会科学研究重大课题攻关项目(No.19JZD023)
上海市科委科技创新行动计划(No.19DZ1201505)
中央高校基本科研业务费项目。
关键词
时间滞后互相关
时间序列
大气污染物
机器学习
time lag cross-correlation
time series
atmospheric pollutants
machine learning