摘要
针对高维数据"维数灾难"问题,降维是最典型的处理方式之一。降维技术不仅可以减弱"维数灾难"的负面影响,而且能够剔除高维数据中的冗余特征,从而提升高维数据回归、分类等任务的效率。高维数据通常呈现出复杂或非线性结构,恰当的降维方法可以有效地将高维特征数据投影至低维空间,以实现原始数据的非线性特征提取。本文尝试使用无监督学习模型稀疏自编码网络对金融高维数据进行非线性特征提取,将提取到的特征作为有监督学习模型BP神经网络的输入以预测指数收益率。更进一步地,为了验证稀疏自编码算法在特征提取方面的优势与有效性,本文引入稀疏主成分模型进行对比分析。实证分析显示:本文所使用的稀疏自编码网络能够较好地提取非线性特征并进行预测,其预测精度优于以稀疏主成分为代表的线性降维方法。
Dimensionality reduction is one of the most common approaches for"dimension curse"on high dimensional data.Dimension reduction technology can not only ease up the negative effect of"dimension curse",but also remove the redundant features in high dimensional data,accordingly improve the efficiency of regression and classification.The high dimensional data usually presents complex or non-linear structure,and the appropriate dimension reduction method can project the high-dimensional feature data into the low-dimensional space effectively to realize the nonlinear feature extraction.Firstly,this paper attempts to use the unsupervised learning model named sparse autoencoder network to extract the nonlinear feature of financial high dimensional data.Then,the extracted features will be input into the supervised learning model of BP neural network to predict the index returns.Secondly,in order to verify the advantages and effectiveness of sparse autoencoder algorithm in feature extraction,we use sparse principal component analysis for comparison.Finally,the empirical analysis shows that the sparse autoencoder network used in this paper has the ability to extract the nonlinear feature,and its prediction accuracy is superior to the linear dimensionality reduction method represented by sparse principal component.
作者
陈艳
俞文强
CHEN Yan;YU Wen-qiang(Business School,Hunan University,Changsha 410082,China;School of Statistics and Management,Shanghai University of Finance and Economics,Shanghai 200433,China)
出处
《数理统计与管理》
CSSCI
北大核心
2021年第1期93-104,共12页
Journal of Applied Statistics and Management
基金
国家自然科学基金资助项目(71571113,91546202)
“中央高校基本科研业务费”资助。
关键词
稀疏自编码
稀疏主成分
BP神经网络
指数预测
非线性特征提取
sparse autoencoder
sparse principal component analysis
BP neural network
index forecasting
nonlinear feature selection