期刊文献+

基于PCA和kNN混合算法的文本分类方法 被引量:4

A Hybrid Algorithm for Text Classification Based PCA and kNN
下载PDF
导出
摘要 随着文本数据的激增,文本分类的高复杂度是一个重要的问题。k近邻(ENN)算法是一个简单、有效,但是计算复杂度很高的分类算法。一般,在使用kNN算法时,使用主成分分析(PCA)进行预处理来减少维数,但是该算法要求投影空间中的所有向量来执行kNN算法。我们提出一个新的混合算法PCA&kNN,使用一个小的邻居集来执行kNN算法,而不是投影空间中的完整的数据向量,从而减少了计算的复杂性。新的文本被投影到较低维的空间,ENN仅使用每个轴的邻居执行,基于更接近原始空间和投影空间且沿着投影成分的主向量。为了验证该方法的有效性,针对Reuters标准数据集进行实验,实验结果显示,新提出的模型显著优于ENN和标准PCA-ENN混合算法,同时保持了相似的分类精确度。 The high computational complexity of text classification is a significant problem with the growing surge in text data. A simple, effective but computationally expensive classification is the k-nearest-neighbor (kNN) algorithm. Generally, using Princi- pal Component Analysis (PCA) as a preprocessing phase to reduce the dimensionality followed by kNN, but the algorithm requires all the vectors in the projected space to perform the kNN. We propose a new hybrid algorithm PCA&kNN, performs kNN with a small set of neighbors instead of the complete data vector in the projected space, thus reducing the computational complexity. New text is projected into the lower dimensional space, kNN is performed only with the neighbors in each axis that based on the princi- pal that are closer in the original space and closer in the projected space, and also along the projected components. In order to ver- ify the effectiveness of this method, with the standard benchmark dataset Reuters, our experimental results show that the proposed model was significantly better than kNN and the standard PCA-kNN hybrid algorithms, while maintaining a similar classification accuracy.
作者 史淼 刘锋 SHI Miao, LIU Feng (School of Computer Science and Technology, Anhui University,Hefei 230601,China)
出处 《电脑知识与技术》 2015年第4期169-171,共3页 Computer Knowledge and Technology
关键词 文本分类 降维 PCAlkNN 混合分类器 加权 text classification dimensionality reduction PCA kNN Hybrid classifier term weighting
  • 相关文献

参考文献5

二级参考文献45

共引文献8

同被引文献32

引证文献4

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部