期刊文献+

中文垃圾邮件过滤系统中的特征提取算法 被引量:1

Feature Selection Method in Chinese Spam Filtering
下载PDF
导出
摘要 针对垃圾邮件过滤,首先对获取的垃圾邮件及合法邮件进行分词,预处理,构建文本矢量,然后用四种常用的特征词提取方法进行矢量降维,再在此基础上,给出了一种综合性的特征词提取算法,即按照各个评估函数的排序结果,取它们交集的前n个特征词作为候选词进行分类测试,仿真比较了各个算法中n对分类结果的影响,从而验证了该算法的有效性。 The paper, aimmed at spam filter, at first separationing, preproccessing and building text vector for the obtained spam mails and legitimate mails, then proccessing vector dimensional reduction using four common key extraction methods, and based on this, presents a comprehensive key extraction algorithm, which takes front n key words of their intersection as a candidate word for classification test according to sort results of each assessment function. Finally, Simulation verifies the effection of "n" on the classification in the algorithm, thus verifying the effectiveness of the proposed algorithm.
出处 《计算机系统应用》 2012年第3期106-110,共5页 Computer Systems & Applications
关键词 垃圾邮件过滤 邮件预处理 特征提取 Rocchio方法 评价指标 spam filtering preprocessing mail feature selection method rocchio evaluation indicator
  • 相关文献

参考文献6

  • 1曹麒麟,张千里编著..垃圾邮件与反垃圾邮件技术[M].北京:人民邮电出版社,2003:162.
  • 2侯汉清.文本自动标引与自动分类研究.南京:东南大学出版社,2009.57-64. 被引量:1
  • 3谷波 刘开瑛.中文文本分类中一种简单高效的特征词选择方法[J].计算机研究与发展,2005,42:359-360. 被引量:1
  • 4戴文华著..基于遗传算法的文本分类及聚类研究[M].北京:科学出版社,2008:222.
  • 5王斌,潘文锋.基于内容的垃圾邮件过滤技术综述[J].中文信息学报,2005,19(5):1-10. 被引量:129
  • 6李晓飞.垃圾邮件过滤算法研究及系统实现.南京:南京理工大学,2008. 被引量:1

二级参考文献32

  • 1李渝勤,孙丽华.基于规则的自动分类在文本分类中的应用[J].中文信息学报,2004,18(4):9-14. 被引量:20
  • 2M. DeSouza, J. Fitzgerald, C. Kempand G. Truong, A Decision Tree based Spam Filtering Agent[EB] . from http:∥www. cs. mu. oz. au/481/2001- projects/gntr/index. html, 2001. 被引量:1
  • 3N. Littlestone, Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm[J]. Machine Learning, 2(4) :285- 318, 1988[J]. 被引量:1
  • 4R. Krishnamurthy and C. Orasan, A corpus-based investigation of junk emails[A]. In: Proceedings of Language Resources and Evaluation Conference (LREC 2002)[C]. Las Palmas de Gran Canaria, Spain, pp. 1773- 1780,May 2002. 被引量:1
  • 5M. Sahami, S. Dumais, D. Heckerman and E. Horvitz, A Bayesian approach to filtering junk e-mail[A]. In:Proc. of AAAI Workshop on Learning for Text Categorization[C]. pp. 55-62, 1998. 被引量:1
  • 6W. Cohen, Fast effective rule induction[A]. In: Machine Learning Proceedings of the Twelfth International Conference[C]. Lake Taho, California, Mongan Kanfmann, pp. 115-123, 1995. 被引量:1
  • 7W. Cohen, Learning rules that classify email[A]. In: Proceedings of the AAAI spring symposium of Machine Learning in Information Access, Palo Alto[C]. California, pp. 18 - 25. 1996. 被引量:1
  • 8X. Carreras and L. Marquez, Boosting Trees for Anti-Spam Email Filtering[A]. In: Proceedings of Euro Conference Recent Advances in NLP (RANLP-2001)[C]. pp. 58-64, Sep. 2001. 被引量:1
  • 9T. Nicholas, Using AdaBoost and Decision Stumps to Identify Spam E-mail[ EB]. Stanford University Course Project (Spring 2002/2003) Report, from http: ∥nlp. stanford. edu/courses/cs224n/2003/fp/. 被引量:1
  • 10Y. Diao, H. LuandD. Wu, A Comparative Study of Classification Based PersonalE-mail Filtering[A]. In: Proceedings of PAKDD-2000[C], pp.408-419, Apr. 2000. 被引量:1

共引文献128

同被引文献10

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部