摘要
目的探讨logistic回归和随机森林在体检人群糖尿病患病风险预测中的应用。方法选择2006年1月-2015年12月在北京航天总医院体检中心参加体检的非糖尿病者11 769例次,随机选取70%样本,以性别、年龄、BMI、吸烟史、饮酒史、高血压既往史、高血压家族史、糖尿病家族史、收缩压、舒张压、空腹血糖、总胆固醇、甘油三酯、脂肪肝等14个因素作为自变量,以5年内是否罹患糖尿病作为因变量,基于logistic回归和随机森林分别建立糖尿病预测模型。将预测模型应用于剩余30%样本,根据所得受试者工作特征曲线的曲线下面积(AUC)评价模型的预测效果。结果 Logistic回归预测模型和随机森林预测模型的AUC分别为0.912(95%CI:0.898~0.927)和0.919(95%CI:0.906~0.932),在最佳临界点,Logistic回归预测模型的灵敏度和特异度分别为80.8%和87.3%,随机森林预测模型的灵敏度和特异度分别为84.1%和85.3%。结论 Logistic回归预测模型和随机森林预测模型对体检人群的糖尿病患病风险均具有良好的预测能力。
Objective To explore the application of logistic regression and random forest to prediction of diabetes mellitus risk in health check-up population. Methods We selected 11,769 non-diabetic individuals who participated in a health check-up in Physical Examination Center, Beijing Aerospace General Hospital from January 2006 to December 2015, and then randomly selected 70% samples to establish two diabetes prediction models, which took 14 factors (including sex, age, body mass index, history of smoking, history of alcohol consumption, previous history of hypertension, family history of hypertension, family history of diabetes, systolic pressure, diastolic pressure, fasting blood-glucose, total cholesterol, triglyceride and fat liver) as arguments and whether or notdeveloping diabetes within 5 years as dependent variable, and were respectively developed by logistic regression and random forest. The remaining 30% samples were used as the validation set, and the predictive performance of different models was evaluated using the area under the receiver operating characteristic curve (AUC). Results The area under the receiver operating characteristic curve was 0.912 (95%CI:0.898-0.927) for logistic regression prediction model and 0.919 (95%CI:0.906-0.932) for random forest prediction model. With optimal cutoffs, logistic regression prediction model and random forest prediction model had a sensitivity of 80.8% and 84.1% respectively, and a specificity of 87.3% and 85.3% respectively. Conclusions Logistic regression prediction model and random forest prediction model both have good predictive performance in diabetes mellitus risk forecast among health check-up population.
出处
《实用预防医学》
CAS
2018年第1期116-119,共4页
Practical Preventive Medicine
关键词
糖尿病
体检
LOGISTIC回归
随机森林
diabetes mellitus
physical examination
logistic regression
random forest