摘要
目的通过对2型糖尿病合并高血压的相关因素分析,构建预测模型。方法选取475例2型糖尿病合并高血压患者为病例组,以同期体检中心505例健康人群为对照组。将最小绝对值收缩和选择算子(LASSO)回归筛选出的特征变量作为随机森林(RF)、极端梯度提升(XGBoost)和逻辑回归(logistic regression)的输入,利用贝叶斯优化方法和交叉验证迭代训练获得最佳的预测模型,最后利用特征重要性排序和Shapley加性解释(SHAP)进行解释分析。结果特征选择结果显示尿糖(GLU)(OR=1.189,95%CI=1.170~1.208,P<0.05)、糖尿病遗传史(OR=1.341,95%CI=1.273~1.411,P<0.05)、年龄(OR=1.006,95%CI=1.004~1.009,P<0.05)、身体质量指数(BMI)(OR=1.017,95%CI=1.010~1.023,P<0.05)、心率(HR)(OR=1.004,95%CI=1.003~1.006,P<0.05)、文化程度(OR=0.954,95%CI=0.934~0.975,P<0.05)、居住地(OR=0.958,95%CI=0.931~0.985,P<0.05)为主要的特征变量。算法实验结果表明,经过参数调优后RF和XGBoost模型性能均优于逻辑回归模型,XGBoost准确率92.85%略高于RF准确率92.34%。特征重要性结果显示,2型糖尿病合并高血压的影响因素重要性排序依次为GLU、糖尿病遗传史、文化程度、居住地、年龄、BMI、HR,其中,GLU、糖尿病遗传史、年龄、BMI、HR为危险因素,文化程度、居住地为保护因素。结论基于XGBoost的2型糖尿病合并高血压预测模型具有更好的性能,通过利用SHAP模型增强模型的可解释性,能够识别出患病的危险因素,为2型糖尿病合并高血压的预防提供参考。
Objective The objective of this study is to construct a predictive model through analyzing the related factors of type 2 diabetes mellitus combined with hypertension,aiming to achieve early detection and treatment.Methods A total of 475 patients with type 2 diabetes mellitus combined with hypertension from the Endocrinology Department of Guangdong Medical University Affiliated Hospital and Affiliated Second Hospital from March to December 2022 were selected as the case group,while 505 healthy individuals undergoing physical examinations during the same period were chosen as the control group.The feature variables selected by Least Absolute Shrinkage and Selection Operator(LASSO)regression were used as inputs for Random Forest(RF),Extreme Gradient Boosting(XGBoost),and Logistic Regression models.The best predictive model was obtained through Bayesian optimization and iterative training with cross-validation.Finally,feature importance ranking and Shapley additive explanation(SHAP)were utilized for interpretation analysis.Results The feature selection results indicated that glucose in urine(GLU)(OR=1.189,95%CI=1.170~1.208,P<0.05),family history of diabetes(OR=1.341,95%CI=1.273~1.411,P<0.05),age(OR=1.006,95%CI=1.004~1.009,P<0.05),body mass index(BMI)(OR=1.017,95%CI=1.010~1.023,P<0.05),heart rate(HR)(OR=1.004,95%CI=1.003~1.006,P<0.05),education level(OR=0.954,95%CI=0.934~0.975,P<0.05),and place of residence(OR=0.958,95%CI=0.931~0.985,P<0.05)were the main feature variables.Experimental results of the algorithms showed that after parameter optimization,RF and XGBoost models outperformed the Logistic Regression model,with XGBoost accuracy at 92.85%,slightly higher than RF accuracy at 92.34%.The results of feature importance show that the influenCIng factors of type 2 diabetes combined with hypertension are ranked in the following order of importance:GLU,family history of diabetes,education level,residential area,age,BMI,and heart rate(HR).Among these,GLU,family history of diabetes,age,BMI,and HR are risk factors,while ed
作者
马勇
孔丹莉
叶向阳
丁元林
MA Yong;KONG Danli;YE Xiangyang;DING Yuanlin(School of Public Health,Guangdong Medical University;The Standing Committee of the People’s Congress of Dongguan,Dongguan 523000,China)
出处
《广东医科大学学报》
2024年第5期523-534,共12页
Journal of Guangdong Medical University
基金
广东省基础与应用基础研究基金区域联合基金项目(重点项目)(2020B1515120021)
广东医科大学学科建设项目(4SG21276P)。