摘要
目的:基于引入注意力机制的长短期记忆网络(long short-term memory,LSTM)和L1正则化的Logistic回归筛选变量,再通过传统的Logistic回归建立重症监护病房(intensive care unit,ICU)脑卒中患者院内死亡风险预测模型并评价模型效果。方法:选取重症医学信息数据库(Medical Information Mart for Intensive Care-Ⅳ,MIMIC-Ⅳ)中的脑卒中患者作为研究对象,以是否发生院内死亡作为结局变量,备选预测因子包括人口学特征、合并症、入院48 h内实验室检查和生命体征检查等。将数据根据结局指标以8∶2的比例随机进行10次训练集和测试集的划分,在训练集上构建LSTM和L1正则化的Logistic回归模型,在测试集上选取重要程度排名前10的变量的并集纳入Logistic回归建立预测模型,以受试者工作特征曲线下面积(area under curve,AUC)、灵敏度、特异度、预测准确度为指标对模型进行评价,并与未预先进行变量筛选的前进法Logistic回归模型的预测效果进行比较。结果:共纳入2755例脑卒中患者的2979条ICU入院记录,其中院内死亡记录占17.66%。两个变量筛选模型中,L1正则化的Logistic回归模型的AUC显著优于LSTM模型(0.819±0.031 vs.0.760±0.018,P<0.001),两个模型中重要程度均位于前10的变量包括年龄、血糖和尿素氮。最终预测模型的AUC为0.85,灵敏度为85.98%,特异度为71.74%,预测准确率为74.26%,优于未预先进行变量筛选的前进法Logistic回归模型。结论:用引入注意力机制的LSTM和L1正则的Logistic回归筛选出的变量的预测效果较好,具有一定的临床价值。
Objective:To select variables related to mortality risk of stroke patients in intensive care unit(ICU)through long short-term memory(LSTM)with attention mechanisms and Logistic regression with L1 norm,and to construct mortality risk prediction model based on conventional Logistic regression with important variables selected from the two models and to evaluate the model performance.Methods:Medical Information Mart for Intensive Care(MIMIC)-Ⅳdatabase was retrospectively analyzed and the patients who were primarily diagnosed with stroke were selected as study population.The outcome was defined as whether the patient died in hospital after admission.Candidate predictors included demogra-phic information,complications,laboratory tests and vital signs in the initial 48 h after ICU admission.The data were randomly divided into a training set and a test set for ten times at a ratio of 8∶2.In training sets,LSTM with attention mechanisms and Logistic regression with L1 norm were constructed to select important variables.In the test sets,the mean importance of variables of ten times was used as a reference to pick out the top 10 variables in each of the two models,and then these variables were included in conventional Logistic regression to build the final prediction model.Model evaluation was based on the area under the receiver operating characteristic curve(AUC),sensitivity,specificity,and accuracy.And the model performance was compared with the forward Logistic regression model which hadn’t conducted variable selection previously.Results:A total of 2755 patients with 2979 ICU admission records were included in the analysis,of which 526 recorded deaths.The AUC of Logistic regression model with L1 norm was statistically better than that of LSTM with attention mechanisms(0.819±0.031 vs.0.760±0.018,P<0.001).Age,blood glucose,and blood urea nitrogen were at the top ten important variables in both of the two models.AUC,sensitivity,specificity,and accuracy of Logistic regression models were 0.85,85.98%,71.74%and 74.26%
作者
邓宇含
姜勇
王子尧
刘爽
汪雨欣
刘宝花
DENG Yu-han;JIANG Yong;WANG Zi-yao;LIU Shuang;WANG Yu-xin;LIU Bao-hua(Department of Social Medicine and Health Education,Peking University School of Public Health,Beijing 100191,China;China National Clinical Research Center for Neurological Diseases,Department of Neurology,Beijing Tian Tan Hospital,Capital Medical University,Beijing 100050,China;Beijing Advanced Innovation Center for Big Data-Based Precision Medicine(Beihang University&Capital Medical University),Beijing 100070,China)
出处
《北京大学学报(医学版)》
CAS
CSCD
北大核心
2022年第3期458-467,共10页
Journal of Peking University:Health Sciences
基金
国家重点研发计划(2018YFC1311700、2018YFC1311703)。