摘要
为解决函数型数据缺失插补问题,在函数型数据分析框架下,以缺失森林模型(MF)为基础,采用基于条件期望主成分分析的函数型插补方法PACE进行初始插补,并通过K-means聚类借助样本之间的相关性插补,给出了一种融合类信息的函数型多重插补方法。模拟数据插补实验结果表明,在不同缺失比例(5%~55%)下,本文方法相较于Hot.deck、MF、均值插补、PACE、MFP、SFI、HFI等7种插补方法,更能保证插补的准确性和有效性。同时,针对股票数据的实例应用验证了本文方法插补得到的数据符合实际情况和规律。
In the framework of functional data analysis and with the aid of Missforest model(MF),this paper proposes a functional multiple interpolation method combining class information by employing the functional interpolation method PACE based on conditional expectation principal component analysis for initial interpolation,and utilizing the correlation interpolation between samples through K-means clustering in order to solve the missing interpolation problem of functional data.The experiment results of simulation data interpolation show that the proposed method under different missing rates(5%~55%)can ensure the accuracy and effectiveness of interolation in a better way,compared with other seven imputation methods such as Hot.deck,MF,Mean imputation,PACE,MFP,SFI and HFI.At the same time,an application example of stock data has verified that the data imputed by the proposed method conforms to the actual situation and rules.
作者
高海燕
李唯欣
马文娟
GAO Hai-yan;LI Wei-xin;MA Wen-juan(School of Statistics and Data Science,Lanzhou University of Finance and Economics Lanzhou Gansu 730020,China;Key Laboratory of Digital Economy and Social Computing Science,Lanzhou University of Finance and Economics Lanzhou Gansu 730020,China)
出处
《西华师范大学学报(自然科学版)》
2024年第5期481-487,共7页
Journal of China West Normal University(Natural Sciences)
基金
国家社会科学基金项目(19XTJ002)
甘肃省自然科学基金项目(23JRRA1186)
甘肃省优秀研究生“创新之星”项目(2022CXZX-701,2023CXZX-703)。
关键词
函数型数据
缺失森林
多重插补
缺失插补方法
functional data
Missforest
multiple imputation
missing imputation method