期刊文献+

一种用于贝叶斯分类器的文本特征选择方法 被引量:6

Method of feature selection for text categorization with bayesian classifiers
下载PDF
导出
摘要 特征选择是文本分类中一种重要的文本预处理技术,它能够有效地提高分类器的精度和效率。文本分类中特征选择的关键是寻求有效的特征评价指标。一般来说,同一个特征评价指标对不同的分类器,其效果不同,由此,一个好的特征评价指标应当考虑分类器的特点。由于朴素贝叶斯分类器简单、高效而且对特征选择很敏感,因此,对用于该种分类器的特征选择方法的研究具有重要的意义。有鉴于此,提出了一种有效的用于贝叶斯分类器的多类别文本特征评价指标:CDM。利用贝叶斯分类器在两个多类别的文本数据集上进行了实验。实验结果表明提出的CDM指标具有比其它特征评价指标更好的特征选择效果。 Feature selection is an important preprocessing technology in text classification.It can improve the efficiency and accuracy of a text classifier.The key of feature selection in text classification is to find an effective feature evaluation metric.In general,the effect of a feature evaluation metric for various classifiers can be very different,and thus a good feature evaluation metric should consider classifier characteristics.As the Naive Bayesian classifier is very simple and efficient and highly sensitive to feature selection, so the research of feature selection specially for it is important.This paper presents a feature evaluation metric for the Naǐve Bayesian classifier applied on multi-class text datasets:Class Discriminating Measure (CDM).Experiments of text classification with Naǐve Bayesian classifiers were carried out on two multi-class texts collections.As the results indicate,CDM gains obviously better selecting effect than other feature selection approaches.
出处 《计算机工程与应用》 CSCD 北大核心 2008年第13期24-26,32,共4页 Computer Engineering and Applications
基金 国家自然科学基金(the National Natural Science Foundation of China under Grant No.60503017,No.60673089)
关键词 文本分类 特征选择 文本预处理 朴素贝叶斯 text classification feature selection text preprocessing Naǐve Bayes
  • 相关文献

参考文献5

  • 1Yang Y,Pedersen J O. A comparative study on feature selection in text categorization[C]//Proceedings of the 14th International Conference on Machine Learning, Nashville, USA, 1997:412-420. 被引量:1
  • 2Mladenic D,Grobelnik M.Feature selection for unbalanced class distribution and Naive Bayes[C]//Proceedings of 16th International Conference on Machine Learning,San Francisco,1999:255-267. 被引量:1
  • 3Forman G.An extensive empirical study of feature selection metrics for text classification[J]Journal of Machine Learning Research,2003,3:1289-1305. 被引量:1
  • 4McCallum A,Nigam K& comparison of event models for naive bayes text classification[C]//Proceedings of AAAI-98 Workshop on Learning for Text Categorization.Menlo Park : AAAI Press, 1998 : 41-48. 被引量:1
  • 5周茜,赵明生,扈旻.中文文本分类中的特征选择研究[J].中文信息学报,2004,18(3):17-23. 被引量:165

二级参考文献8

  • 1Yang Yiming,Pederson J O.A Comparative Study on Feature Selection in Text Categorization [A].Proceedings of the 14th International Conference on Machine learning[C].Nashville:Morgan Kaufmann,1997:412-420. 被引量:2
  • 2Y.Yang.Noise reduction in a statistical approach to text categorization[A].Proceedings of the 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR95)[C].Seattle:ACM Press,1995:256-263. 被引量:2
  • 3Thorsten Joachims,Text Categorization with Support Vector Machines:Learning with Many Relevant Features[A],In:European Conferrence on Machine Learning (ECML)[C].Berlin:Springer,1998,137-142. 被引量:2
  • 4Mlademnic,D.,Grobelnik,M.Feature Selection for unbalanced class distribution and Nave Bayees[A].Proceedings of the Sixteenth International Conference on Machine Learning[C].Bled:Morgan Kaufmann,1999:258-267. 被引量:2
  • 5梁久祯 兰东俊 扈旻.基于先验知识的网页特征压缩与线性分类器设计[A]..第十二届全国神经计算学术大会论文集[C].北京:人民邮电出版社,2002.494-501. 被引量:2
  • 6王梦云,曹素青.基于字频向量的中文文本自动分类系统[J].情报学报,2000,19(6):644-649. 被引量:17
  • 7范焱,郑诚,王清毅,蔡庆生,刘洁.用Naive Bayes方法协调分类Web网页[J].软件学报,2001,12(9):1386-1392. 被引量:53
  • 8刘斌,黄铁军,程军,高文.一种新的基于统计的自动文本分类方法[J].中文信息学报,2002,16(6):18-24. 被引量:48

共引文献164

同被引文献38

引证文献6

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部