摘要
为在回归模型中描述定性属性,通常需要引入哑变量。对含哑变量的回归方程,提出描述不同哑变量在回归方程中不同重要程度的方法。该方法分解出含哑变量的回归方程中哑变量部分和非哑变量部分的回归平方和,计算这两部分在该回归方程中所起作用的占比,将该占比设计为各哑变量在回归方程中的相对重要程度指数。在近10万笔的Lending Club和Prosper网络借贷数据集上,所进行的挖掘借款用途对借款成功率、信用等级对借款利率的影响程度的实验结果表明,与传统回归方程仅提供哑变量前的系数却不能展现其重要程度相比,所提方法展现出不同哑变量的不同重要程度,为定量分析回归方程中定性自变量对因变量的影响程度提供了重要的手段。
To describe the qualitative attributes in the regression model, it is usually necessary to introduce dummy variables. For the regression equation with dummy variables, a method was proposed to describe the different importance of the different dummy variables in the regression equation. The sums of square due to regression with dummy variables were deseomposed, including the sum of the dummy variable part and that of non-dummy variable part, and the proportions of the two parts was calculated in the regression equation, and the proportion was taken as the index of relative importance of every dummy variable in regression equations. In sets of Lending Club and Prosper network with nearly 100 thousand lending data, the experimental results about the influence of the purpose of loan on the borrowing success rate and the influence of credit grade on the borrowing rate show that compared with the traditional regression equation which only provides a dummy variable coefficient and cannot shows its importance, the proposed method can show the importance of different dummy variables, and provide an important means to quantitatively analyze the influence degree of qualitative independent variables on the dependent variable in the regression equation.
出处
《计算机应用》
CSCD
北大核心
2017年第11期3048-3052,共5页
journal of Computer Applications
基金
国家自然科学基金资助项目(61672157)
福建师范大学网络与信息安全关键理论和技术创新团队项目(IRTL1207)~~
关键词
定性属性
回归方程
哑变量
指数
qualitative attribute
regression equation
dummy variable
index