摘要
基于临床试验中结肠癌的样本数据,利用R软件中的glmnet程序包对数据进行实证分析,筛选出对结肠癌影响作用大的基因,然后建立稀疏数据下的Logistic回归模型,并联系背景对模型进行解释.同时用SAS软件对数据做逐步回归,得到逐步回归下的Logistic回归模型,对两种方法下得到的Logistic回归模型进行比较,成功地解决了数据稀疏问题.
In this paper,the colon cancer data in clinical trials was selected,and analyzed by the R software.The genes that have large impact on colon data were leaved out,then build the sparse Logistic model,and contact the background to explain the model.In addition,the SAS software was used to do stepwise regression,and obtain the Logistic regression model.The Logistic regression model under two conditions was compared to solve the problem of sparse data successfully.
作者
王纯杰
刘斌霞
蒋京京
WANG Chun-jie;LIU Bin-xia;JIANG Jing-jing(School of Mathematics and Statistics,Changchun University of Technology,Changchun 130012,China)
出处
《吉林师范大学学报(自然科学版)》
2019年第4期28-39,共12页
Journal of Jilin Normal University:Natural Science Edition
基金
国家自然科学基金项目(11571051,11301037)
吉林省教育厅“十三五”规划项目(2016317)。