摘要
针对垃圾邮件过滤的准确率和稳定性不高,以及为了解决邮件过滤算法在语料分类上存在漏报和误报等问题,提出基于粗糙集的带决策规则边界的邮件过滤算法(RARM)。该算法运用粗糙集理论对语料库进行直接分析,并采用启发式方法提出了粗糙集理论的三种不同决策规则的执行计划,确保当邮件内容的词汇语义较为模糊时,仍能保证一定的分类准确度。在实验仿真中,通过与基于支持向量机(SVM)、Ada Boost和贝叶斯分类的邮件过滤算法相比较,该算法在垃圾邮件过滤上的准确率优于对比算法。
For accuracy and stability of the spam filter is not high , and in order to solve the problem such as e-mail filtering algorithm has false negatives and false positives on the corpus classification. This paper proposed e-mail filtering algorithm with boundary decision rules based on rough set. First, it used rough set theory for direct analysis of corpus and used heuristic methods to propose three different decision rules of the rough set theory in the execution plan, making sure that when the mes- sage content was more blurred at lexical semantics, could still guarantee a certain classification accuracy. In spare classifica- tion experiments, this algorithm is compared with SVM, AdaBoost and Bayesian mail filtering algorithm, which better than the comparison algorithm on the accuracy of spare filtering.
出处
《计算机应用研究》
CSCD
北大核心
2015年第1期258-261,共4页
Application Research of Computers
基金
河南省科技攻关项目(122102210563
132102210215)
关键词
邮件过滤
粗糙集
启发式方法
决策规则边界
spam filtering
rough set
heuristic methods
decision rules boundary