摘要
Box-Cox方法和Johnson方法是数据预处理中2种最常用的正态变换方法,研究2种方法的差异和适用数据对象,能够为数据预处理提供参考,为提出适用性和效度更好的方法做理论上的探索.该研究从数理逻辑上对2种方法进行分析,再通过多组不同分布特征的随机数据样本对2种方法的准确性和适用数据对象进行验证.通过研究发现:Box-Cox方法是单向变换的,Johnson方法是双向对称变换的;Box-Cox方法对偏度的改变效果明显,Johnson方法对峰度的改变效果明显;在应用中,Johnson方法较复杂,整体效果也较好;Johnson方法适用于对偏态不明显的变量正态变换,但对偏态明显的变量的变换效果较差,Box-Cox方法在对偏态不明显的变量的变换中反而会增加偏度,在对偏态明显的变量的变换中表现较Johnson方法好.
Box-Cox transformation and Johnson transformation are two most popular methods for normalization in data preprocessing. A comparative study of their differences and proper application is valuable for data preprocessing as well as further studies of better theories and methods. This paper analyzes the two methods in terms of mathematical logic first,and then tests their accuracy,precision and applicability through several sample data sets in different distributions. The study reaches the following conclusions: first,the former transforms the original data unidirectionally,but the latter is bidirectional and symmetrical; second,the former alters skewness markedly,while the latter influences the kurtosis much more effectively; finally,the latter is more complicated and has a more accurate result in transformation,especially when the variate has tiny skewness,but it is ineffective when the distinct skewness occurs; however the former has a better performance when distinct skewness in the variate occurs because it will increase skewness in this case.
作者
王跃
张鹏新
党跃武
NG Yue;ZHANG Peng-xin;DANG Yue-wu(School of Public Administration,Sichuan University,Chengdu 610065,China;Tarim Oilfield,CNPC,Korla 841000,China)
出处
《云南民族大学学报(自然科学版)》
CAS
2018年第4期340-347,共8页
Journal of Yunnan Minzu University:Natural Sciences Edition