摘要
大型问卷调查不可避免地面临数据缺失问题,调查项目出现无应答和无效应答都会影响数据分析的质量和最终决策的准确性。大型问卷调查数据缺失插补问题可看作矩阵填充(Matrix Completion,MC)问题,利用低秩矩阵恢复技术处理。在不同缺失比例(5%、10%、20%、40%、50%)下,采用基于奇异值阈值算法的MC方法修复缺失数据,并与热卡填充、K-近邻、链式方程多重插补、线性插值等四种常用缺失数据处理方法进行对比。分析结果表明,MC方法在插补准确率、插补误差等方面都具有明显优势,插补效果更好,可为大型问卷调查提供较为可靠的完备数据集。因此,MC方法为大型问卷调查缺失数据处理方法的选择提供借鉴。
Large-scale questionnaire inevitably faces the problem of missing data,and the non-response and invalid response of survey items will also affect the quality of data analysis and the accuracy of final decision.The missing imputation of large-scale questionnaire data can be regarded as a problem shared by Matrix Completion,which is dealt with by low-rank matrix recovery technology.Considering this,this paper used MC method based on singular value threshold algorithm to repair missing data under different missing ratios(5%,10%,20%,40%,50%),and compared this method with four commonly used missing data processing methods such as Hot Deck Imputation,K-Nearest Neighbor,Multivariate Imputation of Chained Equations and Linear Imputation.The results show that MC method has obvious advantages in dealing with the imputation accuracy and error,and it can produce better imputation effect,which can provide a more reliable and complete data set for large-scale questionnaire.Therefore,it concluded that MC method can provide some reference for the selection of missing data processing methods for large-scale questionnaire.
作者
高海燕
李唯欣
牛成英
GAO Hai-yan;LI Wei-xin;NIU Cheng-ying(Lanzhou University of Finance and Economics,Lanzhou 730020,China)
出处
《湖北师范大学学报(自然科学版)》
2023年第3期1-8,共8页
Journal of Hubei Normal University:Natural Science
基金
国家社会科学基金项目(19XTJ002)
甘肃省自然科学基金项目(23JRRA1186)
甘肃省优秀研究生“创新之星”项目(2022CXZX-701)。
关键词
大型问卷调查数据
矩阵填充
缺失数据插补
large-scale questionnaire data
matrix completion
missing data imputation