目的建立一个关于重型颅脑损伤患者生存的简单、方便、准确率高的预测模型。方法采用分类和回归树(classification and regression tree,CART)分析方法,选择8个预后因子,对639例重型颅脑损伤患者的预后进行分析,预后以外伤后6个...目的建立一个关于重型颅脑损伤患者生存的简单、方便、准确率高的预测模型。方法采用分类和回归树(classification and regression tree,CART)分析方法,选择8个预后因子,对639例重型颅脑损伤患者的预后进行分析,预后以外伤后6个月生存或死亡为标准。结果GCS评分是最好的预测因子,血糖、瞳孔情况是强有力的预测因子,头颅CT表现、年龄是有效的预测因子,白细胞计数也对预后产生有意义的影响。CART中的所有变量都与预后相关,预测生存的准确率达91.4%。结论CART预测模型能较好地预测重型颅脑损伤患者的生存,是简单有效、准确率高的预测方法。展开更多
针对CART(classification and regression tree)分类决策树构建过程中由于小样本集特征维数高及噪声等造成的过拟合问题,在CART决策树算法训练过程中引入基于互信息的粗糙集(rough sets,RS)属性约简,考虑信息熵与基尼(GINI)系数刻画样本...针对CART(classification and regression tree)分类决策树构建过程中由于小样本集特征维数高及噪声等造成的过拟合问题,在CART决策树算法训练过程中引入基于互信息的粗糙集(rough sets,RS)属性约简,考虑信息熵与基尼(GINI)系数刻画样本集"纯净度"的相似关系,对历史故障数据进行属性约简,降低属性维度以优化训练集,在此基础上构建分类决策树,可视化输出规则。实验结果表明:将改进的CART决策树算法应用于某型航空发动机油液故障诊断,提取的规则可解释性强,能够减小冗余属性及噪声对决策的影响,与常用故障诊断算法相比,该模型的诊断准确率提升20%左右,AUC(area under curve)值高达92%,可以有效处理高维离散型航空发动机小样本故障问题。展开更多
Imbalanced data is one type of datasets that are frequently found in real-world applications, e.g., fraud detection and cancer diagnosis. For this type of datasets, improving the accuracy to identify their minority cl...Imbalanced data is one type of datasets that are frequently found in real-world applications, e.g., fraud detection and cancer diagnosis. For this type of datasets, improving the accuracy to identify their minority class is a critically important issue.Feature selection is one method to address this issue. An effective feature selection method can choose a subset of features that favor in the accurate determination of the minority class. A decision tree is a classifier that can be built up by using different splitting criteria. Its advantage is the ease of detecting which feature is used as a splitting node. Thus, it is possible to use a decision tree splitting criterion as a feature selection method. In this paper, an embedded feature selection method using our proposed weighted Gini index(WGI) is proposed. Its comparison results with Chi2, F-statistic and Gini index feature selection methods show that F-statistic and Chi2 reach the best performance when only a few features are selected. As the number of selected features increases, our proposed method has the highest probability of achieving the best performance. The area under a receiver operating characteristic curve(ROC AUC) and F-measure are used as evaluation criteria. Experimental results with two datasets show that ROC AUC performance can be high, even if only a few features are selected and used, and only changes slightly as more and more features are selected. However, the performance of Fmeasure achieves excellent performance only if 20% or more of features are chosen. The results are helpful for practitioners to select a proper feature selection method when facing a practical problem.展开更多
油箱壳外形复杂,拉深成形过程中容易出现侧壁起皱和圆角处破裂的缺陷,成形工艺参数的确定非常重要.结合分类与回归决策树(classification and regression tree,CART)的人工智能技术和模型交叉验证方法,通过调用Python平台开源库Scikit...油箱壳外形复杂,拉深成形过程中容易出现侧壁起皱和圆角处破裂的缺陷,成形工艺参数的确定非常重要.结合分类与回归决策树(classification and regression tree,CART)的人工智能技术和模型交叉验证方法,通过调用Python平台开源库Scikit-Learn对油箱壳拉深成形数值模拟结果进行知识挖掘,筛选出对油箱壳拉深成形影响大的工艺参数;以基尼指数(Gini index)最小化作为最优特征值及最优切分点选择的依据,构建了工艺参数与性能指标关系的CART决策树,提取出了可靠的工艺设计规则.油箱壳拉深实例表明,CART决策树理论的知识发现技术是实现板料成形过程数值模拟结果潜在知识挖掘的可行途径.展开更多
文摘目的建立一个关于重型颅脑损伤患者生存的简单、方便、准确率高的预测模型。方法采用分类和回归树(classification and regression tree,CART)分析方法,选择8个预后因子,对639例重型颅脑损伤患者的预后进行分析,预后以外伤后6个月生存或死亡为标准。结果GCS评分是最好的预测因子,血糖、瞳孔情况是强有力的预测因子,头颅CT表现、年龄是有效的预测因子,白细胞计数也对预后产生有意义的影响。CART中的所有变量都与预后相关,预测生存的准确率达91.4%。结论CART预测模型能较好地预测重型颅脑损伤患者的生存,是简单有效、准确率高的预测方法。
文摘针对CART(classification and regression tree)分类决策树构建过程中由于小样本集特征维数高及噪声等造成的过拟合问题,在CART决策树算法训练过程中引入基于互信息的粗糙集(rough sets,RS)属性约简,考虑信息熵与基尼(GINI)系数刻画样本集"纯净度"的相似关系,对历史故障数据进行属性约简,降低属性维度以优化训练集,在此基础上构建分类决策树,可视化输出规则。实验结果表明:将改进的CART决策树算法应用于某型航空发动机油液故障诊断,提取的规则可解释性强,能够减小冗余属性及噪声对决策的影响,与常用故障诊断算法相比,该模型的诊断准确率提升20%左右,AUC(area under curve)值高达92%,可以有效处理高维离散型航空发动机小样本故障问题。
基金supported in part by the National Science Foundation of USA(CMMI-1162482)
文摘Imbalanced data is one type of datasets that are frequently found in real-world applications, e.g., fraud detection and cancer diagnosis. For this type of datasets, improving the accuracy to identify their minority class is a critically important issue.Feature selection is one method to address this issue. An effective feature selection method can choose a subset of features that favor in the accurate determination of the minority class. A decision tree is a classifier that can be built up by using different splitting criteria. Its advantage is the ease of detecting which feature is used as a splitting node. Thus, it is possible to use a decision tree splitting criterion as a feature selection method. In this paper, an embedded feature selection method using our proposed weighted Gini index(WGI) is proposed. Its comparison results with Chi2, F-statistic and Gini index feature selection methods show that F-statistic and Chi2 reach the best performance when only a few features are selected. As the number of selected features increases, our proposed method has the highest probability of achieving the best performance. The area under a receiver operating characteristic curve(ROC AUC) and F-measure are used as evaluation criteria. Experimental results with two datasets show that ROC AUC performance can be high, even if only a few features are selected and used, and only changes slightly as more and more features are selected. However, the performance of Fmeasure achieves excellent performance only if 20% or more of features are chosen. The results are helpful for practitioners to select a proper feature selection method when facing a practical problem.
文摘油箱壳外形复杂,拉深成形过程中容易出现侧壁起皱和圆角处破裂的缺陷,成形工艺参数的确定非常重要.结合分类与回归决策树(classification and regression tree,CART)的人工智能技术和模型交叉验证方法,通过调用Python平台开源库Scikit-Learn对油箱壳拉深成形数值模拟结果进行知识挖掘,筛选出对油箱壳拉深成形影响大的工艺参数;以基尼指数(Gini index)最小化作为最优特征值及最优切分点选择的依据,构建了工艺参数与性能指标关系的CART决策树,提取出了可靠的工艺设计规则.油箱壳拉深实例表明,CART决策树理论的知识发现技术是实现板料成形过程数值模拟结果潜在知识挖掘的可行途径.