摘要
根据基于频数分布和基于互信息的特征选择模式的特点,将传统的tf-idf因子以及基于互信息的特征选择方法分别进行了改进,并在此基础上提出了一种新的组合型特征选择方法。试验结果表明,该算法提高了文本分类的准确率。
Based on the characteristic of feature selection that relates to the frequency distribution and the information, the traditional feature selection method-and the MI have been improved, and a new feature selection method is put forward. The experiment shows that this method has improved the precision of the text classification.
出处
《广西师范大学学报(自然科学版)》
CAS
北大核心
2007年第4期208-211,共4页
Journal of Guangxi Normal University:Natural Science Edition
基金
国家自然科学基金资助项目(70571087)
关键词
特征选择
文本分类
特征权重
互信息
feature selection text categorization feature weight MI