摘要
近年来,冠心病患者人数不断增加,而集成学习具有良好的冠心病风险预测能力,可降低患者就医成本,提高冠心病筛查的效率。本文利用Kaggle平台公开的冠心病数据集,首先对数据集进行了预处理和特征指标筛选,并利用SMOTE算法对数据进行类别平衡,最终得到7 010条数据;选取随机森林、XGBoost、LightGBM 3个集成学习算法,构建相应的冠心病风险预测模型,并利用贝叶斯优化算法对模型进行超参数调优,同时将数据以7∶3的比例分为训练集与测试集进行模型训练与预测;最后,通过准确率、召回率、AUC等指标对3种模型的性能进行比较。结果显示3种集成学习算法预测模型性能均较好,其中LightGBM算法预测模型性能最为突出,验证了集成学习算法运用在冠心病风险预测方面的可行性。
In recent years, the number of people suffering from coronary heart disease is increasing. Ensemble learning has excellent prediction ability for coronary heart disease, which can reduce the cost of patient care and improve the efficiency of coronary heart disease screening. The datasets for this study have been published by Kaggle. Firstly, the data is preprocessed and screened with characteristic indexes, and the SMOTE algorithm is used to balance the data categories after which eventually 7010 pieces of data are obtained. Secondly, three ensemble learning algorithms, Random Forest, XGBoost, and LightGBM are selected to construct the corresponding coronary heart disease risk prediction model, and the Bayesian optimization algorithm is used to optimize the hyperparameters of the model. At the same time, the data is divided into training set and test set in a ratio of 7/3 for model training and prediction. Finally, the performance of the three models is compared by accuracy, recall, AUC and other metrics. The results show that the prediction models of the three ensemble learning algorithms all have good performance, among which the LightGBM algorithm has the most prominent performance, which verifies the feasibility of the ensemble learning algorithm in the risk prediction of coronary heart disease.
作者
苏文星
张振一
郑琰莉
唐琳
宋元涛
SU Wenxing;ZHANG Zhenyi;ZHENG Yanli;TANG Lin;SONG Yuantao(School of Engineering Science,University of Chinese Academy of Sciences,Beijing 100049,China;School of Emergency Management Science and Engineering,University of Chinese Academy of Sciences,Beijing 100049,China;Tianjin TEDA Puhua International Hospital,Tianjin 300457,China;College of Economics and Management,Xi′an University of Technology,Xi′an 710045,China)
出处
《智能计算机与应用》
2022年第7期8-13,19,共7页
Intelligent Computer and Applications
关键词
冠心病
集成学习
贝叶斯优化
SMOTE
风险预测模型
coronary heart Disease
ensemble learning
Bayesian optimization
SMOTE
risk prediction model