摘要
以某钢铁企业的热轧带钢生产实际数据作为分析对象,基于改进的随机森林算法分析工艺参数与产品质量间的隐含关系,进行影响产品质量关键工艺参数的特征提取,建立热轧带钢产品缺陷预测模型.实验结果表明,对非平衡数据集进行平衡处理可以提高样本预测精度;采用CART与C4. 5相结合的方法比单一方法可以进一步提升预测精度;同时根据特征的高相关与低相关特性,将互信息作为评价指标应用于特征选择,可以提升随机森林算法的分类效果.在以上三种改进策略下,热轧带钢缺陷的识别率得到明显提高.
The process data of hot-rolled strips from an iron and steel enterprise were analyzed to find out the inherent relationship between process parameters and production quality by using an improved random forests algorithm.After critical features being extracted, a defect prediction model was built.According to the experiment, balancing operation can improve the prediction accuracy of the imbalanced data sets.Meanwhile, the combination of CART and C4.5 can further improve the prediction accuracy than each single method.Furthermore, in consideration of the characteristics whose features have high or low correlations with the response variable, mutual information was introduced as an evaluation criterion for feature selection.Mutual information makes great contribution to classification effect of random forest algorithm, and recognition rate of defects of hot-rolled strips is obviously improved by using three strategies.
作者
纪英俊
勇晓玥
刘英林
刘士新
JI Ying-jun;YONG Xiao-yue;LIU Ying-lin;LIU Shi-xin(School of Information Science & Engineering, Northeastern University, Shenyang 110819, China;Big Data Department, Shanghai Baosight Software Co., Ltd., Shanghai 201203, China)
出处
《东北大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2019年第1期11-15,共5页
Journal of Northeastern University(Natural Science)
基金
国家重点研发计划项目(2017YFB0306401)
国家自然科学基金资助项目(61573089)
关键词
热轧带钢
缺陷预测
数据驱动
特征提取
随机森林
hot-rolled strip
defect prediction
data driven
feature selection
random forests