期刊文献+

文本分类中特征降维方式的研究 被引量:4

A Study on Feature Dimension Reduction in Text Categorization
下载PDF
导出
摘要 首先介绍了几种常见的特征选择和特征抽取方法,并结合K-近邻分类算法对4种特征选择方法进行了分类测试,同时通过测试分析,提出了一些改进的、可行的互信息评价函数. This paper first introduces five methods of feature selection and feature extraction. Second, K-nearest neighbor is selected as an evaluating classifier to compare the performance of the four feature selection methods in TC. From the test result, a new improved method of FS is presented based on mutual information. The experiment results show that it is effective.
出处 《海南大学学报(自然科学版)》 CAS 2007年第1期62-66,共5页 Natural Science Journal of Hainan University
关键词 文本分类 特征降维 特征选择 互信息 text categorization feature reduction features selection mutual information
  • 相关文献

参考文献7

  • 1LIU Tao,LIU Sheng-ping,CHEN Zheng.An evaluation on feature selection for text clustering[C]∥ Proceedings of the 20th International Conference on Machine Learning (ICML203).Washington DC.:2003:488-495. 被引量:1
  • 2YANG Yiming.A comparative study on feature selection in text categorization[C]∥Proceeding of the Fourteenth International Conference on Machine Learning (ICMLp97).San Francisco:Morgan Kaufmann Publishers,1997:412-420. 被引量:1
  • 3GALAVOTTI Luigi,SEBASTIANI Fabrizio.Feature selection and negative evidence in automated text categorization[C]∥ Proceedings of the ACM KDD-00 Workshop on Text Mining.New York,US:ACM Press,2000:40-42. 被引量:1
  • 4DEERWESTER S,DUMAIS S,FURNAS D.Indexing by latent semantic analysis[J].Journal of the American Society for Information Science,1990,41(6):391-407. 被引量:1
  • 5DOUGLAS BAKER L,MCCALLUM Andrew Kachites.Distributional clustering of words for text classification[C]∥ Proceedings of SIGIR-98,21st ACM International Conference on Research and Development in Information Retrieval.New York,US:ACM Press,1998:96-103. 被引量:1
  • 6YANG Yi-ming.Expert network:Effective and efficient learning from human decisions in text categorization and retrieval[C]∥ Proceedings of the 7 th Annual International ACN-SIGIR Conference on Research and Development in Information Retrieval.Dublin:Springer Verlag,1994:13-22. 被引量:1
  • 7张宁,贾自艳,史忠植.使用KNN算法的文本分类[J].计算机工程,2005,31(8):171-172. 被引量:99

二级参考文献5

  • 1Salton G,Lesk M E.Computer Evaluation of Index and Text Processing. Association for Computing Machinery,1968,15(1). 被引量:1
  • 2Maron M E. On Relevance,Probabilistic Indexing and Information Retrieval. Journal of the ACM,1960,7(3). 被引量:1
  • 3Lewis D D. Feature Selection and Feature Extraction for Text Categorization. In Proceedings of Speech and Natural Language Workshop. Defense Advanced Research Projects Agency,Morgan Kaufmann,1992-02:212-217. 被引量:1
  • 4Yang Yiming,Liu Xin. A Re-examination of Text Categorization Methods. Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR),1999:42-49. 被引量:1
  • 5王继成,潘金贵,张福炎.Web文本挖掘技术研究[J].计算机研究与发展,2000,37(5):513-520. 被引量:275

共引文献98

同被引文献27

引证文献4

二级引证文献73

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部