摘要
朴素贝叶斯算法是一种常见的基于内容的垃圾邮件过滤算法,但是,传统朴素贝叶斯过滤存在判断内容的不确定性和邮件表示不完整性等问题。分析邮件信头各域在正常邮件和垃圾邮件中表现出的不同属性,提取非特征信息,结合特征信息和非特征信息改进朴素贝叶斯算法。实验结果表明,改进的朴素贝叶斯分类方法与单纯使用特征信息的方法相比,垃圾邮件的召回率和准确率更高,凸显了该方法涵盖邮件信息、克服内容判断缺陷的优势。
Nave Bayes algorithm was widely used in the content-based filtering,but traditional Nave Bayes faced many problems,such as the uncertainty of classifying e-mails by analyzing e-mail content,the incompleteness of e-mail representation.In order to overcome these shortcomings,this paper analyzed different attributes between ham e-mail header and spam e-mail header,extracted noncharacteristic information,and improved Nave Bayes algorithm which combined feature information with noncharacteristic information.Experimental results show that the improved Nave Bayes classification approach increases the recall and the precision of spam,covers e-mail information,and makes up for the shortage of content-based filtering,compared with that of only using feature information.
出处
《计算机应用研究》
CSCD
北大核心
2011年第2期514-516,共3页
Application Research of Computers
基金
国家自然科学基金资助项目(60873247)
山东省高新自主创新专项工程(2008ZZ28)
山东省自然科学基金重点资助项目(ZR2009GZ007)
关键词
邮件过滤
非特征信息
特征信息
朴素贝叶斯算法
e-mail filtering
noncharacteristic information
feature information
Nave Bayes algorithm