摘要
[目的]应用逻辑斯蒂回归模型和随机森林算法建立大兴安岭塔河地区林火发生的预测模型并对比模型预测精度,判断随机森林算法在该地区林火预测中的适应性,为该地区林火管理工作提供技术支持。[方法]利用1974—2008年大兴安岭塔河地区森林火灾发生数据,分别运用二项逻辑斯蒂回归模型和随机森林算法,对塔河地区林火发生与气象因子之间的关系进行实证分析。为减少训练样本分布对试验结果的影响,将全样本数据随机分成60%的训练样本和40%的测试样本,并且进行5次重复,建立5个中间模型(样本组)。选择在5个中间模型中的3个及以上的显著变量(因子)对全样本数据进行分析并分别比较2种模型算法在5个中间模型和全样本模型中的预测准确率。此外,还设计了变量交互试验进一步验证相同变量下2种模型的预测精度。[结果]日最小相对湿度、细小可燃物湿度码和干旱码3个因子在二项逻辑斯蒂回归模型和随机森林算法中均与林火发生呈显著相关。模型拟合的预测结果显示:在对5个中间模型的预测中,随机森林算法对训练样本(60%)和测试样本(40%)的预测准确率分别高于二项逻辑斯蒂回归模型8%和10%左右;在全样本模型的预测中,随机森林算法拟合的准确率为85.0%,而二项逻辑斯蒂回归模型拟合的准确率为76.2%,二者相差10%左右,与之前5个中间模型的预测结果一致;在变量交互试验中,随机森林算法拟合的准确率为86.0%,而二项逻辑斯蒂回归模型拟合的准确率为72.8%,随机森林算法的预测准确率提高了18.1%左右。[结论]日最小相对湿度、细小可燃物湿度码和干旱码是影响林火发生的主要气象因子。在基于气象因子的塔河地区林火发生预测模型研究中,随机森林算法的预测准确率高于传统二项逻辑斯蒂回归模型10%左右,具有一定的预测优势和应用价值,可为大兴安岭塔河
[Objective]In this study,two methods were applied to establish fire prediction model for Tahe,Daxing'an Mountains. Our objective is to identify the applicability of random forest algorithm to local forest fire prediction according to prediction accuracy comparison. This study would provide some technical support for local forest fire management.[Method]The fire data collected in Tahe,Daxing'an Mountains between 1974 and 2008 were used in a case study to identify the relationship between fire occurrence and meteorological factors by using logistic regression( LR) model and random forest( RF) algorithm,respectively. In order to reduce the influence of sample distribution on the model fitting,the original dataset was randomly divided into training( 60%) and validation( 40%) samples. The procedure was repeated five times applying a sampling with replacement method,thus obtaining five random sub-samples( sample groups) of the data,each with a training and validation dataset. The predictors that had been proved to be significant at ɑ= 0. 05 in at least three of five intermediate models were included in the final models. Besides,in the present study a"cross validation"test was to identify the accuracy of the two models. [Result]The results of model parameter estimation indicated that daily minimum relative humidity,fine fuel moisture content( FFMC) and drought code( DC) were identified as important predictors in both Logistic and Random Forest model. The result of model fitting revealed that the prediction accuracy of LR model in five intermediate models were 8% and 10% lower than that of RF,respectively,for the training and variation samples. However,the prediction accuracy of RF on the complete dataset was 15% higher than that of LR. In the Cross Validation test,the prediction accuracy of RF was 85. 0%,higher than that of LR( 76. 2%) and the result agreed with that of five sample groups. [Conclusion]Our results revealed that the RF model was superior to LR model on the fire pre
出处
《林业科学》
EI
CAS
CSCD
北大核心
2016年第1期89-98,共10页
Scientia Silvae Sinicae
基金
福建省自然科学基金项目(2015J05049)
福建农林大学校重点项目建设专项(6112C035K)
关键词
塔河地区
林火发生
气象因子
随机森林算法
逻辑斯蒂回归
Tahe area
fire occurrence
meteorological factors
random forest algorithm
Logistic regression