期刊文献+

一种改进的互信息特征选择算法 被引量:7

An Improved Mutual Information Algorithm for Feature Selection
下载PDF
导出
摘要 本文在层次分类的环境下,首先实验比较了文档频率、信息增益、期望交叉熵、x^2统计、文本证据权、互信息6种常用的特征选择算法,结果是互信息的分类效果最差。然后对此作了分析,并在此基础上提出了一种改进型互信息算法。实验结果表明,改进型互信息算法要好于其他算法。单字词的去除使分类效果得到提高,说明词特征更能够比较完整地表达语义信息。 Under the environment of hierarchy classification, first, we do experiments to compare the six kinds of commonly used feature selection algorithm such as document frequency, information gain, expected cross entropy, 2 statistical, the weight of text and mutual information, res^tlng that the classifying effect of mutual information i~ worst. Then we analyze the reason and propose an improved mutual information algorithm. The experimental results show that the improved mutual information algorithm is better than others, and removing single word improves the classifying effects, which proves that words can express semantics information more completely.
出处 《情报学报》 CSSCI 北大核心 2006年第6期651-656,共6页 Journal of the China Society for Scientific and Technical Information
关键词 层次分类 特征选择 互信息 改进 hierarchy classification, feature selection, improved mutual information.
  • 相关文献

参考文献11

二级参考文献41

  • 1冯是聪 单松巍 张志刚 等.一个中文网页数据集及其分类体系[A]..海峡两岸技术交流会[C].南京,2002-10.121-129. 被引量:1
  • 2Yiming Yang,Jan O Pedersen.A comparative Study on Feature Selection in Text Categorization[C].In :Proceedings of the Fourteenth International Conference on Machine Leaming(ICML'97), 1997. 被引量:1
  • 3Yiming Yang,Xin Liu.A re-examination of text categorization methods[C].In:Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR'99,1999:42---49. 被引量:1
  • 4Yiming Yang.A study on thresholding strategies for text categorization[C].In:Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'01),2001. 被引量:1
  • 5SALTON G, WONG A, YANG C S. A vector space model for automatic indexing[J]. Communications of the ACM, 1975, 18 (11) :613 - 620. 被引量:1
  • 6YOUNGJOONG KO, JUNGYUN SEO. Automatic text categorization by unsupervised learning[A]. Proceedings of the 17th conference on Computational linguistics[C]. Association for Computational Linguistics, 2000(1) :453 - 459. 被引量:1
  • 7ZHENG Zhao- hui, WU Xiao -yun, ROHINI Srihari. Feature selection for text categorization on imbalanced data [ J ]. ACM SIGKDD Explorations Newsletter, 2004,6 (6): 80 - 89. 被引量:1
  • 8MACRO Zaffalon, MARCUS Hutter. Robust feature selection by mutual information Distributions[A]. Proceedings of the 18 th international conference on uncertainty in artificial intelligence[C].UAI, 2002. 577 - 584. 被引量:1
  • 9YANG Y, PEDERSEN J. A comparative study on feature selection in text categorization[A]. Proceedings of the Fourteenth International Conference on Machine Learning ( ICML' 97 ) [ C ].1997. 412 - 420. 被引量:1
  • 10ROGATI M, YANG Y. High - performing feature selection for text classification[A]. Proceedings of the 2002 ACM CIKM International Conference on Inforrnation and Knowledge Management[C]. ACM, 2002. 659- 661. 被引量:1

共引文献291

同被引文献40

引证文献7

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部