摘要
为提高贝叶斯垃圾邮件过滤器的精确率和召回率,提出一种改进加权贝叶斯模型(improved weighted bayes model,IWB),通过提高贝叶斯模型的准确性,改善垃圾邮件过滤性能;不同于朴素贝叶斯模型(nave bayes model,NB)对邮件样本特征值所作的独立性和相同重要性的假设,通过给邮件样本的每一个特征值分配一个权值,减小贝叶斯模型与实际间的失配误差;根据贝叶斯公式建立基于最小二乘算法的目标函数,用于对IWB中权向量的优化;由于目标函数为非线性高维函数,提出一种新的粒子群优化算法,能够获得近似全局最优权向量,从而得到最优贝叶斯模型;通过仿真对NB、传统加权贝叶斯模型(weighted bayes model,WB)与IWB进行比较,仿真结果表明IWB能够显著地改善垃圾邮件过滤性能,提高邮件过滤的精确率和召回率。
An improved weighted hayes model (IWB) is proposed in order to increase precision and recall, and anti--spam filtering performance is increased due to higher accuracy of hayes models. Differing from the assumption of naive hayes (NB) model about that all attributes have the same in- dependency and importance, each attribute of mail samples is assigned with a weight in order to decrease the mismatch error between hayes model and reality. Based on least squares method, the objective function is established with hayes formula to optimize the weight vector of IWB. Because the ob- jective function is nonlinear and multivariable, a novel particle swarm optimization (PSO) method is proposed to obtain the approximately global opti- mal weight vector and obtain the optimal hayes model. Comparing IWB with NB and weighted hayes model (WB), the simulation results show that IWB remarkably improves the anti--spare filtering performance and increases precision and recall.
出处
《计算机测量与控制》
北大核心
2013年第8期2181-2184,共4页
Computer Measurement &Control
基金
教育部博士点基金项目(20106121110003)