摘要
目的建立基于机器学习算法的心力衰竭10年患病风险预测模型,并通过SHAP方法提升模型的可解释性,以提高心力衰竭风险评估的准确性和临床应用价值。方法采用英国生物银行(UK Biobank,UKB)数据库数据,涵盖了40~70岁之间的502349名英国成年人,基于2006~2010年间的基线数据。选取487572例未发生心力衰竭和10374例发生心力衰竭的病例,随访时间为10年,以ICD-10编码定义心力衰竭事件。使用LightGBM、XGBoost、CatBoost三种机器学习算法构建预测模型,在Python和RStudio环境中完成数据预处理、特征选择及模型效能评估,利用SHAP方法可视化解释模型预测结果。结果经过随机欠采样平衡样本后,本研究建立的模型有效预测了10年内心力衰竭的发病情况。LightGBM模型展现出最佳的预测性能,其次是CatBoost和XGBoost。SHAP值分析揭示年龄、胱抑素C、接受治疗或服用药物次数、曾诊断患有心血管疾病、心血管疾病相关多基因风险分数是心力衰竭风险预测的重要影响因素。结论本研究证实了机器学习模型在心力衰竭风险预测中的有效性,特别是LightGBM模型在所有比较的模型中表现最佳。SHAP值的分析为理解模型预测的驱动因素提供了新的视角,有助于临床决策支持和风险管理。
Objective To develop a machine learning-based predictive model for the 10-year risk of heart failure and analyze the model’s interpretability using the SHAP method,thereby enhancing the accuracy and clinical utility of heart failure risk assessments.Methods The data from the UK Biobank,encompassing 502,349 UK adults aged 40-70 years were used,based on baseline data from 2006-2010.It included 487,572 cases without heart failure and 10,374 cases with heart failure over a 10-year follow-up,defining heart failure events via ICD-10 codes.The prediction models were built using LightGBM,XGBoost and CatBoost machine learning algorithms.The data preprocessing,feature selection and model performance evaluation were conducted in Python and RStudio environments,with the SHAP method used for the visual interpretation of the model’s predictive outcomes.Results After balancing the samples through random under sampling,the developed models were capable of effectively predicting the 10-year risk of heart failure.The LightGBM model demonstrated superior predictive performance,followed by CatBoost and XGBoost.The SHAP value analysis revealed that the age,cystatin C,the number of treatments or medications taken,previous diagnoses of vascular or heart issues,and polygenic risk scores were significant predictors of heart failure risk.Conclusion The efficacy of machine learning models in predicting the risk of heart failure is confirmed fine,with the LightGBM model outperforming all the compared models.The analysis of SHAP values offers a new perspective on understanding the drivers behind model predictions,aiding clinical decision-making and risk management.
作者
蔡佳音
陈海涛
王增武
CAI Jia-yin;CHEN Hai-tao;WANG Zeng-wu(Division of Prevention and Community Health,National Center for Cardiovascular Diseases,Fuwai Hospital,Chinese Academy of Medical Sciences&Peking Union Medical College&Chinese Academy of Medical Sciences,Beijing 102308,China;School of Public Health,Shenzhen,Sun Yat-Sen University,518107 Shenzhen,China)
出处
《中国心血管病研究》
CAS
2024年第4期323-330,共8页
Chinese Journal of Cardiovascular Research
基金
国家卫生健康委委托项目(NHC 2020-609)。