摘要
在进行煤层底板突水预测时,水害状况一般分为安全和突水2种状态,状态数据具有非平衡特点,而已有的煤层底板突水预测模型主要适用于平衡数据,对非平衡数据集预测结果常呈现"一边倒"现象,即安全状况的预测准确率明显高于突水状况的预测准确率,整体预测性能较低。针对该问题,构建了基于代价敏感理论的多决策树煤层底板突水预测模型。该模型中,每个决策树选用不同的突水影响因素作为单决策树的根节点,单决策树节点属性选择准则融合代价敏感理论及Gini指标,从而加重了对突水数据(少数类)误判的惩罚力度,提高了突水状况的预测性能;根据构建的单决策树突水预测模型得到其规则集,将所有单决策树突水预测模型规则集合并,得到多决策树突水预测模型规则集,采用多决策树突水预测模型规则集得到多个突水数据的预测结果,而后采用少数服从多数原则,基于投票法得到最终的预测结果。实验结果表明:该模型随着惩罚因子的增大,真实正类率预测结果呈现先增后减的趋势;与基于分类回归树(CART)算法的单决策树突水预测模型相比较,在数据不平衡率为2、分类错误惩罚因子取4时,该模型真实正类率可达到93.06%,真实负类率可达到97.85%,准确率为96.25%,均优于基于CART算法的突水预测模型性能;在数据不平衡率提高到6、分类错误惩罚因子取20时,2种模型的正类率均达到100%,本文模型的负类率为99.37%,准确率为99.47%,依然优于基于CART算法的突水预测模型性能。实验结果验证了本文模型的有效性。
When predicting coal seam floor water inrush,the situation is generally divided into two states:safe state and water inrush state.The state data has non-equilibrium characteristics.The existing coal seam floor water inrush prediction models are mainly suitable for balanced data.In the context of processing unbalanced data sets,the results often show"one-sided"phenomenon which means that the accuracy of safe state prediction is significantly higher than the accuracy of water inrush state,therefore the overall prediction performance is low.To address this problem,the multi-decision tree prediction model for coal seam floor water inrush based on cost-sensitive theory is established.In this model,each decision tree selects different water inrush factors as the root node of the single decision tree,and the node attribute selection criterion of single decision tree combines the cost-sensitive theory and Gini index,thus increasing the penalty for false prediction of water inrush data(minority of cases)and improving the prediction performance of water inrush.The rule set of single decision tree water inrush prediction model is obtained,and the rule set of the multi-decision tree water inrush prediction models are obtained by combining all the rules sets of single decision tree water inrush prediction models.The rule set of the multi-decision tree water inrush prediction models is used to obtain the prediction results of multiple water inrush data.Hence,the final prediction results are obtained based on the voting method and the minority obeying the majority principle.The experimental results show that as the penalty factors of the model increasing,the prediction result of the true positive rate presents a trend of first increasing and then decreasing.Compared with the single decision tree water inrush prediction model based on the classification regression tree(CART)algorithm,the true positive rate of the model can reach 93.06%,and the true negative class rate can reach 97.85%,and the accuracy rate is 96.25%with the data
作者
李彦民
周晨阳
李凤莲
LI Yanmin;ZHOU Chenyang;LI Fenglian(College of Information and Computer,Taiyuan University of Technology,Jinzhong 030600,China;College of Data Science,Taiyuan University of Technology,Jinzhong 030600,China)
出处
《工矿自动化》
北大核心
2020年第12期76-83,共8页
Journal Of Mine Automation
基金
山西省自然科学基金项目(201801D121138)
山西省人才专项项目(201605D211021)。
关键词
煤层底板突水预测
突水影响因素
非平衡数据集
代价敏感理论
多决策树
coal seam floor water inrush prediction
water inrush influencing factors
unbalanced data set
cost-sensitive theory
multi-decision trees