摘要
针对互信息特征选择方法由于没有很好结合正相关特征和负相关特征,影响在不平衡语料集上分类效果的问题,用平衡因子调整正相关和负相关特征比例,加强特征选择时负相关特征的作用。同时引入特征分布差异因子,区分类强相关特征,提高分类效果。最后通过实验证明,改进的互信息特征选择方法具有可行性和有效性。
To solve the problem of the poor effect of mutual information-based feature selection on the unbalanced corpus which arise from not well combining positive feature and negative feature.The ratio of positive feature and negative feature is adjusted with balance factor to strengthen the effect of negative feature.And category strong related feature is distincted with feature distributed factor.The experimental results verify the efficiency and probability of the improved mutual information-based feature selection.
出处
《计算机工程与应用》
CSCD
北大核心
2010年第34期123-125,共3页
Computer Engineering and Applications
基金
航空科学基金项目(No.2006ZC31001)~~
关键词
文本分类
特征选择
互信息
平衡因子
特征分布差异
text categorization
feature selection
mutual information
balance factor
feature distribute difference