摘要
特征选择是当今研究领域的一个热点,尤其是文本分类领域中的热点。针对χ2统计方法的两个缺陷:降低了低频词的权重和提高了很少在指定类中出现但普遍存在于其他类的特征在该类中的权重,对χ2统计方法进行改进,并通过做模拟和对比实验,对比改进前后的方法对文本分类的影响。在模拟和对比实验中,改进后方法的分类效果要好于传统的方法。
Feature selection is a hot topic in current search field,especially in the field of text categorization.In this paper,χ2 statistical method has two defects.One is reducing the weight of the low-frequency words.The other is increasing the weight of the characteristics in the designated class.The characteristics little appear in designated class but other classes.Through simulation and comparison experiment,the result is better than before.
出处
《计算机工程与应用》
CSCD
北大核心
2009年第14期136-137,140,共3页
Computer Engineering and Applications
关键词
文本分类
特征选择
χ2统计
text categorization
feature selection
χ^2 statistics