摘要
电子邮件系统分类的正确性与风险性是评价邮件系统好坏的关键因素,邮件过滤是文本分类问题的一种特殊应用。将神经网络中的覆盖算法引入到邮件过滤中,结合多种特征降维方法进行邮件分类实验,并与SVM方法进行了比较。给出一个结合覆盖算法、合适的特征选择与降维方法的分类器,可以实现较好的效果。另外,根据垃圾邮件过滤在实际使用中的最小风险性的要求,从风险角度分析了覆盖算法对测试样本进行分类时的过程。根据分析结果提出对其拒识样本的处理过程进行改进,通过改变非垃圾邮件所属覆盖的影响范围降低了垃圾邮件过滤时的风险。
The correction rate and the risk rate of classification are important factors for evaluating an E-Mail system's performance,and spare filtering is a particular application of text categorization. This paper introduced covering algorithm (CA) of NN into spam filtering, and used several feature reduction methods to classify E-Mail. Comparing with SVM, the results of experiments indicated that it is an effective method to realize a spam filter using the combination of covering algorithm,appropriated feature selection and reduction methods. For the need of minimum risk of sparn filtering,we proposed an improvement of one process in the handling of rejection samples by employing cross cover algorithm according to the result of analysis. The results show that this method can reduce the risk by changing the area which is affected by normal mail.
出处
《计算机科学》
CSCD
北大核心
2009年第8期217-219,253,共4页
Computer Science
基金
国家自然科学基金(60675031)
973计划(2004CB318108
2007BC311003)资助
关键词
垃圾邮件过滤
覆盖算法
特征选择
特征降维
Spare filtering, Covering algorithm, Feature selection, Feature reduction