摘要
背景糖尿病视网膜病变(diabetic retinopathy,DR)是糖尿病患者主要并发症之一,其病程进行性发展可致视功能损伤甚至失明。探索影响DR进展的临床因素对糖尿病患者预防、控制和管理DR具有重要意义。目的通过机器学习算法和沙普利可加性特征解释方法(SHAP)分析探讨2型糖尿病患者并发DR的风险因素。方法回顾性分析“国家人口与健康科学数据共享平台”公布的“解放军总医院糖尿病并发症预警数据集”3000例2型糖尿病患者的临床资料,对58项观察变量在无DR并发症(non diabetic retinopathy,NDR)患者和并发DR患者两组组间进行基线分析以及差异性检验;评判XGBoost、随机森林、logistic回归三种机器学习算法,采用递归特征消除(RFE)和XGBoost机器学习算法选取最优模型预测变量,并对变量特征权重值排序;应用SHAP方法对模型的风险因子进行解释分析。结果DR组的高血压症(收缩压/舒张压)、糖化血红蛋白、血脂水平(总胆固醇、低密度脂蛋白)、脑卒中、肾病(血尿素、血肌酐、血尿酸)、肾衰、下肢动脉病变等并发比例或指标水平高于NDR组(P<0.05),而年龄、冠心病、心肌梗死、高脂血症、动脉粥样硬化症等低于NDR组(P<0.05)。XGBoost较其他模型表现更佳,模型中排在前十位的重要区分特征为肾病、冠心病、下肢动脉病变、身高、其他肿瘤、糖化血红蛋白、血尿素、血清白蛋白、肾衰、高脂血症。SHAP集成散点图解释XGBoost模型中变量的重要性依次为糖化血红蛋白(0.59)、肾病(0.44)、血尿素(0.32)、下肢动脉病变(0.25),四项的SHAP值>0且绝对值均高。同时SHAP值分布呈现明显分类,即DR的显著危险因素。糖化血红蛋白、肾病、血尿素对DR病程影响呈现潜在交互关系,且血尿素>5 mmol/L时DR风险显著升高。结论XGBoost算法和SHAP模型可用于预测糖尿病患者DR的风险因素及解释特征变量交互关系,提示
Background Diabetic retinopathy(DR)is one of the main complications in patients with diabetes.The progressive development of DR can lead to visual impairment and even blindness.It is of great significance to explore the clinical factors affecting the progress of DR for its prevention,control and management in diabetic patients.Objective To explore the risk factors of diabetic retinopathy(DR)in patients with type 2 diabetes mellitus by machine learning algorithms and SHAP analysis.Methods A retrospective analysis was performed for the clinical data about 3000 patients with type 2 diabetes mellitus in the early warning data set of diabetes complications of Chinese PLA General Hospital published by‘The national population and health science data sharing platform’,baseline analysis and difference tests were carried out for 58 observed variables between non diabetic retinopathy(NDR)group and DR group.Three machine learning algorithms including XGBoost,random forest and logistic regression were evaluated.Recursive feature elimination(RFE)and XGBoost,were employed to rank the characteristic weight values of the optimal variables.The risk factors of the model were explained and analyzed by the method of SHAP.Results The incidences or index levels of hypertension(systolic/diastolic blood pressure),glycosylated hemoglobin(HbA1c),blood lipid level(total cholesterol,low density lipoprotein),stroke,kidney disease(blood urea,serum creatinine,serum uric acid),renal failure,lower extremity artery disease in DR group were higher than those in NDR group(all P<0.05);while the average age and incidences of coronary heart disease,myocardial infarction,hyperlipidemia,atherosclerosis were lower than those in NDR group(P<0.05).The top ten important distinguishing features of XGBoost model were kidney disease,coronary heart disease,lower extremity artery disease,height,other tumors,HbA1c,blood urea,serum albumin,renal failure and hyperlipidemia.XGBoost model was better than other models.The importance of variables in XGBoost model wa
作者
宋亚男
武惠韬
应俊
李琬悦
陈康
刘铁城
张卯年
张颖
SONG Ya'nan;WU Huitao;YING Jun;LI Wanyue;CHEN Kang;LIU Tiecheng;ZHANG Maonian;ZHANG Ying(Big Data Center,Medical Innovation Research Division,Chinese PLA General Hospital,Beijing 100853,China;Information Management Department,Chinese PLA General Hospital,Beijing 100853,China;Chinese PLA Medical School,Beijing 100853,China;Department of Endocrinology,the First Medical Center,Chinese PLA General Hospital,Beijing 100853,China;Department of Ophthalmology,the First Medical Center,Chinese PLA General Hospital,Beijing 100853,China)
出处
《解放军医学院学报》
CAS
北大核心
2021年第9期906-912,992,共8页
Academic Journal of Chinese PLA Medical School
基金
解放军总医院医疗大数据研发项目(2017MBD-020)。