摘要
目的应用随机森林算法和Logistic回归算法,分析2型糖尿病并发视网膜病变的关联因素并构建风险预测模型。方法采用2011~2013年中国人民解放军总医院2型糖尿病住院患者的电子病历信息,主要利用其中的糖尿病诊断数据、糖尿病糖化数据以及糖尿病生化检查数据,应用Logistic回归和随机森林算法,根据ROC曲线下面积比较两种模型的预测效果。结果在随机森林模型的39个变量重要性评分中,糖化血红蛋白、空腹血糖、尿素、肌酐、尿酸、年龄、冠心病和慢性肾病得分较高且具有临床意义,Logistic回归模型最终纳入性别、血糖控制情况(糖化血红蛋白浓度)、慢性肾病、冠心病、心梗和癌症6个因素,ROC曲线下面积提示随机森林模型预测效果优于Logistic回归模型。结论本次研究随机森林算法分析结果给出了各个因素指标的重要性评分,为2型糖尿病并发视网膜病变的早期诊断以及优化诊断流程提供了一定的依据。
Objective To analyze the relevant factors of type 2 diabetes mellitus complicated with retinopathy and to construct the risk prediction model based on machine learning, the random forest algorithm, and the Logistic regression algorithm based on the epidemiological design. Methods To analyze the data from the electronic medical record of patients with type 2 diabetes mellitus complicated with retinopathy in the General Hospital of PLA during 2011-2013. The main focus was on the diagnostic data of diabetes mellitus, the glycosylated data, and biochemical examination data. The prediction effect of the two models were compared with the Logistic regression algorithm and random forest algorithm according the area under the ROC curve. Results Among the 39 variables in the the random forest models, blood glucose control(Hb Alc), fasting glucose, urea, creatinine, uric acid, age, coronary heart disease(CHD), and chronic kidney disease(CKD) had higher scores and were of significant clinical explanations. The Logistic regression model finally in corporated six factors: sex, Hb Alc, CKD, CHD, myocardial infarction, and cancer. The area under the ROC curve showed that the prediction effect of the random forest model was better than the Logistic regression Model. Conclusion The research provided grading of the significance of different variable, which to a certain extent provides guidance for the early diagnosis of type 2 diabetes mellitus complicated with retinopathy and the optimization of clinical diagnosis flow.
出处
《中国医疗设备》
2016年第3期33-38,69,共7页
China Medical Devices
基金
国家自然科学基金(61501518)