摘要
PCA是一种常用的线性降维方法,但在实际应用中,当数据规模比较大时无法将样本数据全部读入内存进行分析计算。文章提出了一种针对较大规模数据应用PCA进行降维的方法,该方法在不借助Hadoop云计算平台的条件下解决了较大规模数据不能直接降维的问题,实际证明该方法具有很好的应用效果。
PCA is a general method of linear dimensionality reduction. It is unable to read all the sample data into the memory to do analysis when the data scale becomes large. A method of dimensionality reduction for large scale data using PCA without Hadoop is proposed in this paper. This method solves the problem that it can't do dimensionality reduction directly on large scale data. Practice proves that this method has a good application effect.
出处
《电脑知识与技术(过刊)》
2014年第3X期1835-1837,共3页
Computer Knowledge and Technology
基金
国家留学基金资助项目(201204190040)
关键词
主成分分析
降维
大数据
PCA
dimensionality reduction
large scale data