摘要
大数据时代,深度学习算法的不断完善丰富了量化投资领域的分析方法,在众多量化投资策略中,多因子选股策略因其稳定的收益而备受投资者青睐。本文借助Tushare Pro金融大数据平台和聚宽量化交易平台,选取2009年10月至2019年3月沪深300各成分股日度数据作为研究对象,全面选取行情类、财务类、技术类和投资者情绪类四个类别共117个因子构建初始因子池,利用集成思想综合计算Pearson相关系数、距离相关系数、基于AIC准则的Elastic Net、基于BIC准则的Elastic Net、随机森林和GBDT共六个模型对于各个因子的重要性进行评分,筛选出68个因子;运用自注意力神经网络模型,通过过去60个交易日的因子数据,预测各成分股未来一个月的价格变动趋势,按上涨概率大小选取出前50只股票按等权重的资金分配方式构建投资组合,以月为周期进行投资组合的更新。实证结果表明,该投资策略相比于沪深300指数具有更高的收益和较低的风险。
In the era of big data,the continuous improvement of deep learning algorithm enriches the analytical methods in the field of quantitative investment.Among many quantitative investment strategies,multi-factor stock selection strategy is favored by investors because of its stable returns.Based on the financial big data platform‘Tushare Pro’and the quantitative trading platform‘JoinQuant’,the daily data related to the constituent stocks of CSI 300 index is selected from October 2009 to March 2019 as the research object in this paper.In order to fully consider all factors affecting stock price volatility,the market factors,financial factors,technology factors and investor sentiment factors are selected to form an initial factor set.In order to ensure the quality of data utilization,some preprocess related to hysteresis,missing values and standardization is performed.At the same time,based on the idea of model ensemble in machine learning domain,68 factors are selected for the construction of the stock selection model,comprehensively considering Pearson correlation coefficient,distance correlation coefficient,Elastic Net based on AIC criterion,Elastic Net based on BIC criterion,random forest and GBDT.Finally,the factor data of the past 60 trading days is used to predict the price trend of the CSI 300 constituents in the next month.The top 50 stocks are selected to construct the portfolio with equal weight allocations each time,according to the predicted rising probability.Meanwhile,the portfolio is updated monthly.Empirical results show that the investment strategy has higher returns and lower risks than the Shanghai-Shenzhen300 Index.
作者
张虎
沈寒蕾
刘晔诚
ZHANG Hu;SHEN Han-lei;LIU Ye-cheng(School of Statistics and Mathematics,Zhongnan University of Economics and Law,Wuhan 430073,China)
出处
《数理统计与管理》
CSSCI
北大核心
2020年第3期556-570,共15页
Journal of Applied Statistics and Management
基金
第四次全国经济普查公开招标立项课题(JJPCZB23)
中央高校专项课题(412/31510000111)。
关键词
多因子选股
量化投资
集成学习
自注意力神经网络
multi-factor stock selection
quantitative investment
ensemble learning
self-attention neural network