期刊文献+

基于LZ复杂性相似度的垃圾邮件识别 被引量:3

LZ complexity similarity based spam detection
下载PDF
导出
摘要 提出一个基于符号序列间LZ复杂性相似度的垃圾邮件识别方法。相比基于向量空间模型的邮件识别,邮件文本间的LZ复杂性相似度计算无需对文本进行预处理和特征提取。同时,K近邻规则的延迟学习特性适合于垃圾邮件样本需要动态调整的应用环境。在Ling-Spam邮件语料集上对提出的识别方法进行十重交叉验证,其总体的识别效果优于基于向量空间模型的部分统计和机器学习方法。 A spam detection method is proposed based on the LZ complexity similarity of symbolic sequences and K nearest neighbor rule.Compared to approaches based on vector space model,the calculation of the LZ complexity similarity between email documents requires neither'text preprocessing nor feature extraction.The lazy learning characteristic of K nearest neighbor rule facilitates the application environment that the spam sample set needs to be adjusted dynamically.The proposed method has been tested on the Ling-Spam dataset using a 10-Fold cross validation.The total detection effect is better than the results of some contrast methods based on vector space model.
出处 《计算机工程与应用》 CSCD 北大核心 2007年第29期176-178,共3页 Computer Engineering and Applications
关键词 垃圾邮件 LZ复杂性相似度 K近邻规则 spam LZ complexity similarity K nearest neighbor rule
  • 相关文献

参考文献10

  • 1潘文锋..基于内容的垃圾邮件过滤研究[D].中国科学院计算技术研究所,2004:
  • 2Schneider K.A comparison of event models for naive Bayes antispam e-mail filtering[C]//Proc of the 10th Conf on Computational Linguistics,2003:307-314. 被引量:1
  • 3DeSouza M,Fitzgerald J.A decision tree based spam filtering agent[EB/OL].(2001).http://www.cs.mu.oz.au/481/2001_projects/gntr/index.html. 被引量:1
  • 4Clark J,Koprinska I,Poon J.A neural network based approach to automated e-mail classification[C]//Proc of the IEEE/WIC Intl Conf on Web Intelligence,2003:702-705. 被引量:1
  • 5Minoru S,Hiroyuki S.Spam detection using text clustering[C]//Proc of the Intl Conf on Cyberworlds,2005:316-319. 被引量:1
  • 6Li M,Chen X,Li X,et al.The similarity metric[J].IEEE Trans on Information Theory,2004,50(12):3250-3264. 被引量:1
  • 7Bin Li,Yi-Bing Li,Hong-Bo He.LZ Complexity Distance of DNA Sequences and Its Application in Phylogenetic Tree Reconstruction[J].Genomics, Proteomics & Bioinformatics,2005,3(4):206-212. 被引量:4
  • 8Lempel A,Ziv J.On the complexity of finite sequences[J].IEEE Trans on information theory,1976,22(1):75-81. 被引量:1
  • 9谢惠民著..复杂性与动力系统[M].上海:上海科技教育出版社,1994:236.
  • 10Sakkis G,Androutsopoulos I,Paliouras G,et al.Stacking classifiers for anti-spam filtering of e-mail[C]//Proc of the 6th Conf on Empirical Methods in Natural Language Processing,2001:44-50. 被引量:1

二级参考文献20

  • 1[1]Hao,B.L.and Zhang,S.Y.2002.Handbook of Bioinformatics (second edition).Shanghai Scientific and Technical Publishers,Shanghai,China. 被引量:1
  • 2[2]Nei,M.and Kumar,S.2000.Molecular Evolution and Phylogenetics.Oxford University Press,New York,USA. 被引量:1
  • 3[3]Misener,S.and Krawetz,S.A.(eds.) 2000.Bioinformatics:Methods and Protocols.Humana Press,Totowa,USA. 被引量:1
  • 4[4]Vinga,S.and Almeida,J.2003.Alignment-free sequence comparison-a review.Bioinformatics 19:513-523. 被引量:1
  • 5[5]Li,M.and Vitanyi,P.1997.An Introduction to Kolmogorov Complexity and Its Applications (second edition).Springer-Verlag,New York,USA. 被引量:1
  • 6[6]Li,M.,et al.2001.An information-based sequence distance and its application to whole mitochondrial genome phylogeny.Bioinformatics 17:149-154. 被引量:1
  • 7[7]Chen,X.,et al.1999.A compression algorithm for DNA sequences and its applications in genome comparison.Genome Inform.Ser.Workshop Genome Inform.10:51-61. 被引量:1
  • 8[8]Sato,H.,et al.2001.DNA data compression in the post genome era.Genome Informatics 12:512-514. 被引量:1
  • 9[9]Lempel,A.and Ziv,J.1976.On the complexity of finite sequences.IEEE Trans.Inf.Theory 22:75-81. 被引量:1
  • 10[10]Kaspar,F.and Schuster,H.G.1987.Easily calculable measure for the complexity of spatiotemporal patterns.Phys.Rev.A 36:842-848. 被引量:1

共引文献3

同被引文献30

引证文献3

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部