摘要
目的:通过Logis tic回归和支持向量机(SVM)探究早发性结直肠癌(EOCRC)和晚发性结直肠癌(LOCRC)的危险因素,建立针对不同年龄段人群的风险预测模型并比较预测效果。方法:选择2012—2022年诊断为结直肠癌患者,记录人口学特征、临床表现、既往史、家族史、生活方式、体格检查、实验室检查及病理诊断,分别建立风险预测模型,比较两模型的ROC曲线下面积(AUROC)、准确率、精确率、召回率、F1分数。结果:综合两模型结果,EOCRC风险与出现消化道出血、腹胀腹痛、大便习惯改变等临床表现、体重减轻、肿瘤标志物升高具有较强的正相关性,与婚姻状况、阑尾切除史、糖尿病史、血脂异常病史、结直肠癌家族史也存在较弱的正相关;LOCRC风险与婚姻状况、出现临床表现、体重减轻、血脂异常、肿瘤标志物升高具有较强的正相关性,与年龄、吸烟、阑尾切除史、结直肠癌家族史也存在一定的正相关性。两模型的AUROC、准确率、F1分数相差不大,但Logistic回归模型的精确率更高而SVM模型的召回率更高。结论:EOCRC和LOCRC的危险因素不完全相同,婚姻状况、吸烟、血脂异常、肿瘤家族史在EOCRC中的贡献低于在LOCRC中的贡献。相较Logistic回归,SVM能发现更多的结直肠癌危险因素,能尽可能多的找出结直肠癌的可能患者。
Objective:The study,by exploring and comparing the risk factors of EOCRC and late-onset colorectal cancer(LOCRC)respectively using Logistic regression and support vector machine(SVM),aimed to construct age-based risk prediction models for patients at the risk of CRC and compare the prediction performance of the two models,so as to provide an effective methodology or the prevention of EOCRC.Methods:Patients who were diagnosed as CRC were assigned into the study group.Demographic characteristics,clinical manifestations,past history,family history,lifestyle,and the data of physical examinations,laboratory tests and pathological diagnosis were collected.Risk prediction models for EOCRC and LOCRC were established,followed by a comparison of the area under the ROC curve(AUROC),accuracy,precision,recall rate and F1 score between the two models.Results:EOCRC was positively correlated with gastrointestinal bleeding,abdominal distension and pain,changes in stool habits,weight loss,and elevated tumor markers,in contrast to a relatively weaker correlation with marital status,appendectomy history,diabetes history,history of dyslipidemia,and family history of CRC.As for LOCRC,marital status,clinical presentation,weight loss,dyslipidemia,and elevated tumor markers were identified as the strongly correlated factors,while a positive association was also spotted between the disease and age,smoking,appendectomy history,as well as family CRC history.Comparison was made between the prediction results from Logistic regression model and from SVM model,which indicated a similar AUROC,accuracy and F1 scores of the two models but a higher accuracy in Logistic regression model and a greater recall rate of SVM model.Conclusions:The risk factors of EOCRC and LOCRC are not identical.SVM is capable of identifying more CRC-related risk factors.Moreover,comparison between two models reveal a higher recall rate of the SVM model,which implicates this tool is likely to identify more potential CRC patients.
作者
薛亦诚
刘超
杨贵淞
齐宏
XUE Yi-cheng;LIU Chao;YANG Gui-song;QI Hong(Department of Gastrointestinal SurgeryⅡ,Qingdao Municipal Hospital Affiliated to Qingdao University,Qingdao 266071,China)
出处
《中国现代普通外科进展》
CAS
2024年第3期195-198,共4页
Chinese Journal of Current Advances in General Surgery
基金
青岛市医疗卫生优秀人才培养项目资助(青卫科教字[2019]6号)。