摘要
结合垃圾邮件分类系统的具体要求,在传统规则分类方法的基础上引入机器学习的知识,给出了系统体系结构和特征提取算法,试验了一种对新邮件计算所属类别后验概率的方法,并详细讨论了一个基于朴素贝叶斯方法的个性化垃圾邮件分类系统的设计。提出的分TFIDF特征子集提取算法和朴素贝叶斯方法对邮件进行分类具有较好的分类精度,应用朴素贝叶斯方法在新邮件到达的同时对其进行分类,具有较好的分类速度。
The research of anti junk mail is the hotspot in computer science research area at all times. This paper combines the specific demand to junk mail classifier, introduces the knowledge of machine learning on the base of the traditional regular classification, presents the architecture of the junk mail system and the feature extraction algorithm, and tests a new method to compute the posteriori probability which sort a new email fall into, and discusses in detail the design of an individual junk mail classifier which is based on Na? ve - Bayes. When the system uses the dispart words algorithm, TFIDF feature subset abstraction algorithm and Naive - Bayes method, it classifies emails more precisely and more quickly.
出处
《盐城工学院学报(自然科学版)》
CAS
2008年第2期47-50,共4页
Journal of Yancheng Institute of Technology:Natural Science Edition