期刊文献+

具有错误发现率控制的网络连接数据变量选择

Variable Selection in Network-linked Data with FDR Control
下载PDF
导出
摘要 网络连接数据的统计推断问题已成为近年来统计学研究的热点问题.传统模型中样本数据间的独立性假设通常不能满足现代网络连接数据的分析需求.本文研究了网络连接数据中每个节点的独立效应,并借助融合惩罚的思想,使得相互连接节点的独立效应趋同.同时借助仿变量方法 (Knockoff)仿冒原始变量的数据依赖结构、构造与目标变量无关的属性特征,提出了针对网络连接数据进行变量选择的仿变量方法 (NLKF).从理论上证明了NLKF方法将变量选择的错误发现率(FDR)控制在目标水平.对于原始数据协方差未知的情形,使用估计的协方差矩阵仍具有上述良好的统计性质.通过与传统变量选择方法 Lasso对比,说明了本文方法的可靠性.最后结合因子投资领域2022年1–12月中国A股市场4 000只股票的200个因子数据及每只股票所属申万一级行业构造的网络关系,给出模型的应用实例. The statistical inference of network data has become a hot topic in statistical research in recent years.The independence assumption among sample data in traditional models often fails to meet the analytical demands of modern network-linked data.This work studies the independent effect of each network node in the network-linked data,and based on the idea of fusion penalty,the independent effect of the associated nodes is converged.Knockoff variables construct covariates independent of the target variable by imitating the structure of the original variable.With the help of Knockoff variables,this study proposes a general method framework for variable selection for network-linked data(NLKF).The study proves that NLKF can control the false discovery rate(FDR) at the target level and has higher statistical power than the Lasso variable selection method.When the covariance of the original data is unknown,the covariance matrix using the estimation still has good statistical properties.Finally,combining the 200 factor samples of more than 4 000 stocks in the A-share market and their network relations constructed by Shenyin Wanguo's first-level industry classification,an example of the application in the field of financial engineering is given.
作者 卢滢 李阳 LU Ying;LI Yang(Department of Statistics and Finance,School of Management,University of Science and Technology of China,Hefei 230026,China)
出处 《计算机系统应用》 2024年第5期28-36,共9页 Computer Systems & Applications
基金 国家自然科学基金(12101584)。
关键词 网络连接数据 变量选择 Knockoff方法 错误发现率 network-linked data variable selection Knockoff method false discovery rate(FDR)
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部