摘要
传统的关联文本分类算法产生的规则数量巨大,若不对规则剪枝会影响分类效率,而采用以前的剪枝方法又会使分类精度出现不同程度的下降.为此提出以互信息的方法对每个类的规则进行剪枝,挑选出分类能力强的规则构成分类器,对待分类文本进行分类.经过这个方法剪枝后的规则数量大幅减少,且能取得比规则集未修剪过的分类器和采用以前剪枝方法的ARC-BC算法更好的分类效果,大量的实验表明此方法是有效的.
The traditional associative classifying algorithms of associative texts generate a huge mumber of rules. If the rules were not pruned, the efficiency of classification would be influenced. However, if the former pruning method were adopted, different degrees of accuracy of classification would appear. Therefore, an associative text classification algo- rithm-based on rules pruning of mutual information is presented to prune the rules of each class. The rules with high clas- sifying capacity are chosen to form classifiers to classify the texts being classified. The study illuminates that the mutual information-based rules pruning algorithm not only gets much less rules but is more helpful for improving the accuracy of the association categorization. The experimental results show the performance of this method is better than both ARC - BC algorithm and the algorithm which uses all rules.
出处
《南京师范大学学报(工程技术版)》
CAS
2008年第4期173-177,共5页
Journal of Nanjing Normal University(Engineering and Technology Edition)
基金
教育部留学回国人员启动基金
中科院软件所开放课题基金(SYSKF0701)
福州大学科技发展基金(2005-XQ-13)
福建省教育厅基金(JB06023)资助项目
关键词
互信息
规则剪枝
关联分类
mutual information, rules pruning, associative classification