期刊文献+

海量数据下广义线性模型参数的聚合估计算法研究 被引量:2

Aggregate Estimation of Parameters in Generalized Linear Model with Massive Data
下载PDF
导出
摘要 海量数据下研究广义线性模型参数的估计算法,针对通常的极大似然估计或拟似然估计方程算法中每步迭代均需使用到全体观测数据而造成存储空间不足、计算负担繁重的问题,对广义线性模型参数估计方法进行了改进。结合分治算法与Newton-Raphson算法,提出一种适用于在单机和分布式并行环境下广义线性模型参数求解的聚合拟似然估计方程算法,并进一步研究了聚合拟似然估计量的渐近性质。研究结果表明,当数据分块数目满足一定条件时,所得到的聚合拟似然估计与基于全部数据直接得到的极大拟似然估计具有相同的渐近性质。在数值模拟中,通过单机和Spark集群的实现方式对算法进行数值计算,结果表明聚合拟似然估计方法在解决了数据存储问题的同时提高了计算效率。最后,利用该算法估计Probit模型参数,并将估计出的模型应用于超对称粒子分类问题。 In the parameter estimation problem of the generalized linear model under massive data,in order to solve the problem of insufficient storage space caused by the use of all observation data in each iteration of the usual maximum likelihood estimation or quasi-likelihood estimation equation algorithm,the estimation method is improved.Combining the divide and conquer algorithm with Newton-Raphson.An algorithm is proposed for aggregate quasi-likelihood estimation equations suitable for solving in a single machine and distributed parallel environment,and the asymptotic properties of aggregate estimators are furbher studied.The results show that,when the number of data partitions meets certain conditions,the obtained aggregate quasi-likelihood estimation has the same asymptotic properties as the maximum quasi-likelihood estimation based directly on all data.In the numerical simulation,the algorithm is numerically calculated through the implementation of stand-alone and Spark clusters which shows that the aggregation quasi-likelihood estimation method improves the calculation efficiency while solving the data storage problem.Finally,the algorithm is used to estimate the Probit model parameters,and the estimated model is applied to the supersymmetric particle classification problem.
作者 陈少东 李志强 CHEN Shao-dong;LI Zhi-qiang(College of Mathematics and Science,Beijing University of Chemical Technology,Beijing 100029,China)
出处 《统计与信息论坛》 CSSCI 北大核心 2020年第7期18-24,共7页 Journal of Statistics and Information
关键词 广义线性模型 海量数据 分治算法 聚合拟似然估计方程 generalized linear model massive data divide and conquer algorithm aggregated quasi-likelihood estimation equation
  • 相关文献

参考文献4

二级参考文献26

  • 1钟连德,孙小端,陈永胜,贺玉龙,刘小明.高速公路事故预测模型[J].北京工业大学学报,2009,35(7):966-971. 被引量:8
  • 2Hua-Ping Zhang,Rui-Qi Zhang,Yan-Ping Zhao,Bao-Jun Ma.Big Data Modeling and Analysis of Microblog Ecosystem[J].International Journal of Automation and computing,2014,11(2):119-127. 被引量:6
  • 3王惠文,孟洁.多元线性回归的预测建模方法[J].北京航空航天大学学报,2007,33(4):500-504. 被引量:241
  • 4SEGHOUANE A K.New AIC corrected variants for multivariate linear regression model selection[J].IEEE Transactions on Aerospace and Electronic Systems,2011,47(2):1154-1165. 被引量:1
  • 5AKOZ O,KARSLIGIL M E.Severity detection of traffic accidents at intersections based on vehicle motion analysis and multiphase linear regression[C]//Proceedings of the 13th International IEEE Conference on Intelligent Transportation Systems.Piscataway:IEEE,2010:474-479. 被引量:1
  • 6SIMON P W,MATTHEW G K,FRED L M.Statistical and econometric methods for transportation data analysis[M].2nd ed.London:Chapman and Hall/CRC,2010. 被引量:1
  • 7SHEN L O,LU C X,ZHAO,et al.Discrete fourier transformation for seasonal-factor pattern classification and assignment[J].IEEE Transactions on Intelligent Transportation Systems,2013,14(2):511-516. 被引量:1
  • 8WANG S,CUI L J,LIU D C,et al.Vehicle identification via sparse representation[J].IEEE Transactions on Intelligent Transportation Systems,2012,13(2):955-962. 被引量:1
  • 9NANDI A,YU C,BOHANNON P,et al.Data cube materialization and mining over MapReduce[J].IEEE Transactions on Knowledge and Data Engineering,2012,24(20):1747-1759. 被引量:1
  • 10WANGZK,AGRAWALD,TANKL.COSAC:a framework for combinatorial statistical analysis on cloud[J].IEEE Transactions on Knowledge and Data Engineering,2013,25(9):2010-2023. 被引量:1

共引文献21

同被引文献14

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部