摘要
目的筛选子痫前期的危险因素并构建基于机器学习算法的子痫前期预测模型。方法收集重庆医科大学医学数据研究院大数据平台中2016年1月-2018年12月1609例住院孕妇的临床数据进行回顾性分析。依据住院期间是否发生子痫前期分为子痫前期组(n=291)与非子痫前期组(n=1318)。随机抽取70%患者的临床资料作为训练集(n=1126)构建预测模型,其余30%作为测试集(n=483)进行验证,并对测试集和训练集进行一致性检验。采用单因素分析及logistic回归分析筛选独立危险因素,利用5折交叉验证算法寻找LightGBM算法的最优参数,并基于LightGBM机器学习算法构建预测模型。结果共收集了58项指标,排除缺失率≥30%的13项指标,最终共纳入45项指标。子痫前期组与非子痫前期组的谷氨酰转移酶、谷丙转氨酶、凝血酶时间、谷草转氨酶、尿比重等35项指标差异有统计学意义(P<0.05)。Logistic回归分析结果显示,尿比重、尿酸、平均红细胞血红蛋白浓度、球蛋白、血小板分布宽度、钾离子、就诊年龄、高血压家族史、收缩压、舒张压、脉搏和孕周≥34周是子痫前期的独立危险因素。经5折交叉验证,当num_leaves=5、max_depth=3、min_data_in_leaf=91、feature_fraction=0.8、bagging_fraction=0.6,bagging_freq=5时,LightGBM模型的效果达到最优,模型的曲线下面积(AUC)为0.964,敏感度为84.9%,特异度为92.7%。结论基于LightGBM机器学习算法构建的子痫前期预测模型具有较好的预测效能,能够有效预测重庆地区孕妇子痫前期的发生,为临床医师提供决策参考。
Objective To screen the risk factors of preeclampsia and construct the predictive model of preeclampsia based on machine learning algorithm. Methods A retrospective study was conducted to collect the clinical data of 1609 hospitalized pregnant women from January 2016 to December 2018 on the big data platform of Academy of Medical Data Science of Chongqing Medical University. The 1609 cases were divided into preeclampsia group(n=291) and non-preeclampsia group(n=1318) according to the occurrence of preeclampsia during hospitalization. The clinical data of 70% patients were randomly selected as the training set(n=1126) to construct the prediction model, and the remaining 30% were used as the test set(n=483) for verification, and a consistency check between training set and test set was performed. The independent risk factors were screened by univariate analysis and logistic regression analysis, and the optimal parameters of LightGBM algorithm were searched by 5-fold cross-validation algorithm, and the prediction model was constructed based on LightGBM machine learning algorithm. Results A total of 58 indicators were collected, 13 indicators with missing rate ≥30% were excluded, and 45 indicators were finally included. Significant differences of 35 indicators existed between preeclampsia group and non-preeclampsia group(P<0.05) such as gamma-glutamyl transferase(GGT), alanine aminotrans ferase(ALT), thrombin time, aspartase transaminase(AST) and specific gravity of urine. Logistic regression analysis showed that specific gravity of urine, uric acid, hemoglobin concentration of erythrocyte, globulin, platelet distribution width, potassium ion, visiting age, family history of hypertension, systolic blood pressure, diastolic blood pressure, pulse and gestational age ≥34 weeks were independent risk factors for preeclampsia. The results of 5-fold cross-validation showed that, when num_leaves=5, max_depth=3, min_data_in_leaf=91, feature_fraction=0.8, bagging_fraction=0.6, and bagging_freq=5, the LightGBM model achieve
作者
郑江元
祝锐
颜永杰
周洋
罗亚玲
Zheng Jiang-Yuan;Zhu Rui;Yan Yong-Jie;Zhou Yang;Luo Ya-Ling(College of Medical Informatics,Chongqing Medical University,Chongqing 400016,China;Medical Data Science Academy,Chongqing Medical University,Chongqing 400016,China)
出处
《解放军医学杂志》
CAS
CSCD
北大核心
2022年第8期802-808,共7页
Medical Journal of Chinese People's Liberation Army
基金
国家社会科学基金(15BGL191)。
关键词
子痫前期
机器学习
预测模型
preeclampsia
machine learning
prediction model