期刊文献+

文本分类中一种混合型特征降维方法 被引量:11

Mixed Method of Reducing Feature in Text Classification
下载PDF
导出
摘要 提出一种基于特征选择和特征抽取的混合型文本特征降维方法,分析基于选择和抽取的特征降维方法各自的特点,借助特征项的类别分布差异信息对特征集进行初步选择。使用一种新的基于PCA的特征抽取方法对剩余特征集进行二次抽取,在最大限度减少信息损失的前提下实现了文本特征的有效降维。对文本的分类实验结果表明,该特征降维方法具有良好的分类效果。 A mixed method of reducing the text features based on feature selection and feature extraction is brought forward. The characteristics about feature selection and feature extraction are analyzed. Some features are chosen by using the sort distribution information. And a new way based on Principle Component Analysis(PCA) is used to extract the surplus features and realize the compression of features twice. In the precondition of the information loss least, the text feature decrease smart is completed. Test results show that this method has a better precision in the text categorization.
出处 《计算机工程》 CAS CSCD 北大核心 2009年第2期194-196,共3页 Computer Engineering
基金 国家自然科学基金资助项目(70571087)
关键词 文本分类 特征选择 特征抽取 主成分分析 text classification feature selection feature extraction Principle Component Analysis(PCA)
  • 相关文献

参考文献6

二级参考文献18

  • 1宋枫溪,陈才扣,刘树海,杨静宇.文本表示方式对线性支持向量机分类性能的影响[J].模式识别与人工智能,2004,17(2):161-166. 被引量:4
  • 2姜旦.信息论[M].合肥:中国科技大学出版社,1987.14-96. 被引量:1
  • 3Sebastiani F. Machine Learning in Automated Text Categorization.ACM Computing Surveys, 2002,34(1):1-47. 被引量:1
  • 4Hsu C, Lin C. A Comparison of Methods for Multiclass Support Vector Machines. IEEE Transaction on Neural Networks, 2002,13(2). 被引量:1
  • 5Aas K, Eikvil L. Text Categorization: A survey. Technical Report #941, Norwegian Computing Center, 1999. 被引量:1
  • 6Lang K. Newsweeder: Learning to Filter Netnews. In Proceeding of the Twelfth International Conference on Machine Learning, 1995:331. 被引量:1
  • 7Yang Y, Pedersen J O. A Comparative Study on Feature Selection in Text Categorization. In Machine Learning: Proceedings of the Fourteenth International Confercnce (ICM L'97), 1997:412-420. 被引量:1
  • 8Vapnik V. The Nature of Statistical Learning Theory. Springer-Verlag,New York, 1995. 被引量:1
  • 9Yang Yiming. An Evaluation of Statistical Approaches to Tcxt Categorization. Information Retrieval,1999, 1(1-2):69-90. 被引量:1
  • 10Joachims T. Text Categorization with Support Vector Machines:Learning with Many Relevant Features. Proc. 10th European Conference on Machine Learning(ECML), Springer-Verlag,1998. 被引量:1

共引文献20

同被引文献71

引证文献11

二级引证文献43

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部