期刊文献+

基于模糊自适应粒子群的垃圾邮件过滤新方法 被引量:7

Novel spam filtering method based on fuzzy adaptive particle swarm optimization
下载PDF
导出
摘要 提出了一种新的垃圾邮件过滤方法(NSFM),从高维的文本特征中删除冗余的特征,选择对分类精度提高有贡献的特征,从而提高了垃圾邮件过滤的分类准确率。提出了一种模糊自适应粒子群(IFAPSO),通过模糊控制,动态的调控粒子群的惯性权重、学习因子和粒子数量比。NSFM包含核心特征选择、特征选择、垃圾邮件过滤3个阶段,第一阶段利用信息增益求取每个特征的信息值,构建核心特征集合,生成一定数量的核心特征子集;第二阶段根据核心特征子集对IFAPSO进行初始化,利用模糊控制器对粒子群进行自适应的调节,完成特征选择;第三阶段使用支持向量机对最优的特征子集分类,完成垃圾邮件过滤。本文采用PU1、Ling-Spam、SpamAssassin数据集数,通过多种对比实验证明:本方法自适应性强,可选择到较优的特征子集,有效地提高了分类精度,提升了垃圾邮件过滤的性能,具有较高的实用价值。 A Novel Spam Filtering Method (NSFM) is proposed, which removes redundant attributes from the high dimensional attributes, and selects the attributes, which contribute to the classification accuracy, thus, to improve the classification rate of spare filtering. A fuzzy adaptive particle swarm algorithm is developed, which can dynamically control the inertia weight, learning factor and particle number factor using fuzzy control. The NSFM consists of three stages, kernel feature selection, feature selection and spare filtering. In the first stage, information gain is employed to calculate the infarmation value of each feature, and construct a kernel feature set, thereby obtaining a number of kernel feature subsets. In the second stage, according to the kernel feature subset, IFAPSO is initialized and adjusted adaptively using the fuzzy controller, thus finishing spam filtering. In the final stage, support vector machine is used to classify the optimal feature subset and finish spare filtering. In this paper, PUI, I.ing-Spam and SpamAssassin data sets are utilized. Through many comparative experiments, it is confirmed that the proposed method is adaptable and can select better feature subsets, thereby enhancing the classification accuracy rate effectively, and building up the performance of spare filtering. The NSFM has important practical value.
出处 《吉林大学学报(工学版)》 EI CAS CSCD 北大核心 2011年第3期716-720,共5页 Journal of Jilin University:Engineering and Technology Edition
基金 国家自然科学基金项目(60971089) 国家电子发展基金项目(财建[2009]537号) 吉林省科技厅项目(20090502)
关键词 人工智能 特征选择 粒子群 模糊控制 垃圾邮件过滤 支持向量机 artificial intelligence feature selection particle swarm optimization fuzzy control spare filtering support vector machines
  • 相关文献

参考文献8

  • 1Guzella T S, Caminhas W M. A review of machine learning approaches to spam filtering[J]. Expert Systems with Applications, 2009, 36 (7): 10206- 10222. 被引量:1
  • 2Blanzieri E,Bryl A. A survey of learning-based tech- niques of email spare filtering[J]. Artificial Intelli- gence Review, 2008,29 ( 1 ) : 63-92. 被引量:1
  • 3Zheleva E,Kolcz A,Getoor L. Trusting spam report- ers:a reporter based reputation system for email fil-tering[J]. Acre Transactions on Information Sys- tems,2009 ,27 (1) :1-37. 被引量:1
  • 4Chen J N, Huang H K. Feature selection for text classification with naive Bayes[J]. Expert Systems with Applications,2009,36(3) :5432-5435. 被引量:1
  • 5刘杰,金弟,杜惠君,刘大有.一种新的混合特征选择方法RRK[J].吉林大学学报(工学版),2009,39(2):419-423. 被引量:7
  • 6Bajpai P, Singh S N. Fuzzy adaptive particle swarm optimization for bidding strategy in uniform price spot market[J]. IEEE Transactions on Power Sys- tems,2007,22(4) :2152-2160. 被引量:1
  • 7Niknam T. A new fuzzy adaptive hybrid particle swarm optimization algorithm for non-linear, non- smooth and non-convex economic dispateh problem [J]. Applied Energy, 2010,87 ( 1 ) : 327-339. 被引量:1
  • 8Niknam T, Mojarrad H D. A new fuzzy adaptive par- ticle swarm optimization for non-smooth economic dispatch[J]. Energy,2010,35(4) : 1764-1778. 被引量:1

二级参考文献6

  • 1Sebban M,Nock R.Contribution of boosting inwrapper models[].Proceedings of the rd Euro-pean Conf on Principles and Practice of KDD.1999 被引量:1
  • 2Lin J Y,Ke H R,Chien B C,et al.Classifier designwith feature selection and feature extraction usinglayered genetic programming[].Expert Systemwith Application.2007 被引量:1
  • 3Kononenko Igor.Estimating attributes:analysis andextensions of RELIEF[].The Proceedings of theSeventh European Conference on Machine LearningCatania.1994 被引量:1
  • 4MITRA P,,MURTHY C A,,PALS K.Unsupervised Feature Selection Using Feature Similarity[].IEEE Trans Pattern A-nalysis and Machine Intelligence.2002 被引量:1
  • 5D.J. Kim,Y.W. Park,D.J. Park.A novel validity index for determination of the optimal number of clusters[].IEICE TransINF& SYST.2001 被引量:1
  • 6Kira K,Rendell L.A Practical Approach to Feature Selection[].Proceedings of theth International Conference on Maching Learning.1992 被引量:1

共引文献6

同被引文献50

  • 1万九卿,李行善.基于诊断信息量的测点选择方法[J].电子测量与仪器学报,2005,19(5):1-5. 被引量:3
  • 2吴晔,肖井华,吴智远,杨俊忠,马宝军.手机短信网络的生长过程研究[J].物理学报,2007,56(4):2037-2041. 被引量:27
  • 3Liu Y N, Wang fuzzy ant colony optimization [J]. Journal of Computational 2011,7(4) : 1206-1213. L, et al. An for feature Information S. 被引量:1
  • 4Zhao Zheng-dong, Wang Gang, Zhao Wei, et al. A fuzzy adaptive multi-population parallel genetic algorithm for spam filtering[J]. Journal of Convergence Information Technology,2011,6(2) : 172 182. 被引量:1
  • 5Li Yong-ming, Zhang Su-juan, Zeng Xiao-ping. Research of multi-population agent genetic algorithm for feature selection[J]. Expert Systems with Ap- plications, 2009,36(9) : 11570-11581. 被引量:1
  • 6Zhu Hu-ming, Jiao Li-cheng, Pan Jin. Multi-popu lation genetic algorithm for feature selection[J] Lecture Notes in Computer Science, 2006,4222 : 480 487. 被引量:1
  • 7Yu J F,Yin Y H. Assembly line balancing based on an adaptive genetic algorithm[J]. International Journal of Advanced Manufacturing Technology, 2010, 48(1-4) ..347-354. 被引量:1
  • 8Jin Y C. Evolutionary optimization in uncertain evorinments-a survey[J]. IEEE Transaction on Evolution Computation, 2005,9 ( 3 ) : 303-317. 被引量:1
  • 9Cover T M,Thomas J A,阮吉寿,等.信息论基础[M].北京:机械工业出版社,2005.348-354. 被引量:7
  • 10龚纯,王正林.精通MATLAB最优化计算[M].北京:电子工业出版社,2012. 被引量:31

引证文献7

二级引证文献36

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部