基于LZ复杂性相似度的垃圾邮件识别被引量：3

LZ complexity similarity based spam detection

下载PDF

导出

摘要提出一个基于符号序列间LZ复杂性相似度的垃圾邮件识别方法。相比基于向量空间模型的邮件识别,邮件文本间的LZ复杂性相似度计算无需对文本进行预处理和特征提取。同时,K近邻规则的延迟学习特性适合于垃圾邮件样本需要动态调整的应用环境。在Ling-Spam邮件语料集上对提出的识别方法进行十重交叉验证,其总体的识别效果优于基于向量空间模型的部分统计和机器学习方法。 A spam detection method is proposed based on the LZ complexity similarity of symbolic sequences and K nearest neighbor rule.Compared to approaches based on vector space model,the calculation of the LZ complexity similarity between email documents requires neither＇text preprocessing nor feature extraction.The lazy learning characteristic of K nearest neighbor rule facilitates the application environment that the spam sample set needs to be adjusted dynamically.The proposed method has been tested on the Ling-Spam dataset using a 10-Fold cross validation.The total detection effect is better than the results of some contrast methods based on vector space model.

作者李斌李义兵何红波

机构地区中南大学信息科学与工程学院

出处《计算机工程与应用》 CSCD 北大核心 2007年第29期176-178,共3页 Computer Engineering and Applications

关键词垃圾邮件 LZ复杂性相似度 K近邻规则 spam LZ complexity similarity K nearest neighbor rule

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献10

1潘文锋..基于内容的垃圾邮件过滤研究[D].中国科学院计算技术研究所,2004:
2Schneider K.A comparison of event models for naive Bayes antispam e-mail filtering[C]//Proc of the 10th Conf on Computational Linguistics,2003:307-314. 被引量：1
3DeSouza M,Fitzgerald J.A decision tree based spam filtering agent[EB/OL].(2001).http://www.cs.mu.oz.au/481/2001_projects/gntr/index.html. 被引量：1
4Clark J,Koprinska I,Poon J.A neural network based approach to automated e-mail classification[C]//Proc of the IEEE/WIC Intl Conf on Web Intelligence,2003:702-705. 被引量：1
5Minoru S,Hiroyuki S.Spam detection using text clustering[C]//Proc of the Intl Conf on Cyberworlds,2005:316-319. 被引量：1
6Li M,Chen X,Li X,et al.The similarity metric[J].IEEE Trans on Information Theory,2004,50(12):3250-3264. 被引量：1
7Bin Li,Yi-Bing Li,Hong-Bo He.LZ Complexity Distance of DNA Sequences and Its Application in Phylogenetic Tree Reconstruction[J].Genomics, Proteomics & Bioinformatics,2005,3(4):206-212. 被引量：4
8Lempel A,Ziv J.On the complexity of finite sequences[J].IEEE Trans on information theory,1976,22(1):75-81. 被引量：1
9谢惠民著..复杂性与动力系统[M].上海:上海科技教育出版社,1994:236.
10Sakkis G,Androutsopoulos I,Paliouras G,et al.Stacking classifiers for anti-spam filtering of e-mail[C]//Proc of the 6th Conf on Empirical Methods in Natural Language Processing,2001:44-50. 被引量：1

二级参考文献20

1[1]Hao,B.L.and Zhang,S.Y.2002.Handbook of Bioinformatics (second edition).Shanghai Scientific and Technical Publishers,Shanghai,China. 被引量：1
2[2]Nei,M.and Kumar,S.2000.Molecular Evolution and Phylogenetics.Oxford University Press,New York,USA. 被引量：1
3[3]Misener,S.and Krawetz,S.A.(eds.) 2000.Bioinformatics:Methods and Protocols.Humana Press,Totowa,USA. 被引量：1
4[4]Vinga,S.and Almeida,J.2003.Alignment-free sequence comparison-a review.Bioinformatics 19:513-523. 被引量：1
5[5]Li,M.and Vitanyi,P.1997.An Introduction to Kolmogorov Complexity and Its Applications (second edition).Springer-Verlag,New York,USA. 被引量：1
6[6]Li,M.,et al.2001.An information-based sequence distance and its application to whole mitochondrial genome phylogeny.Bioinformatics 17:149-154. 被引量：1
7[7]Chen,X.,et al.1999.A compression algorithm for DNA sequences and its applications in genome comparison.Genome Inform.Ser.Workshop Genome Inform.10:51-61. 被引量：1
8[8]Sato,H.,et al.2001.DNA data compression in the post genome era.Genome Informatics 12:512-514. 被引量：1
9[9]Lempel,A.and Ziv,J.1976.On the complexity of finite sequences.IEEE Trans.Inf.Theory 22:75-81. 被引量：1
10[10]Kaspar,F.and Schuster,H.G.1987.Easily calculable measure for the complexity of spatiotemporal patterns.Phys.Rev.A 36:842-848. 被引量：1

共引文献3

1李斌,李义兵,何红波.基于复杂性K近邻规则的蛋白质亚细胞位点预测[J].计算机工程,2007,33(7):28-29. 被引量：1
2苏志中,廖波,陈维洋,申众.1种基于信息理论的新分子序列度量法[J].计算机与应用化学,2009,26(11):1380-1384.
3沈娟,吴文武,解小莉,郭满才,袁志发.基于DNA序列K-tuple分布的一种非序列比对分析[J].遗传,2010,32(6):606-612. 被引量：4

同被引文献30

1张登科,易秀双,王兴伟.一种基于相似度测量的新垃圾邮件发现机制[J].中国海洋大学学报（自然科学版）,2008,38(S1):147-150. 被引量：1
2张仁伟,王洪斌.一种基于行为检测的垃圾邮件过滤技术[J].哈尔滨职业技术学院学报,2008(4):123-125. 被引量：1
3丁文斌,李斌,罗浩.基于改进贝叶斯的垃圾邮件过滤系统设计与实现[J].计算机工程与应用,2005,41(18):127-130. 被引量：14
4李文静.第三代反垃圾邮件技术——行为识别[J].互联网天地,2005(11):29-29. 被引量：1
5贺云辉,赵力,邹采荣.基于核的最近邻特征重心分类器及人脸识别应用[J].电路与系统学报,2007,12(2):5-10. 被引量：2
6ChenC Y, Chang C C, Lee R C T. A near patternmatching scheme based upon principal component analysis[J]. Pattern Recognition Letters, 1995, 16(4): 339-345. 被引量：1
7Fukunaga K. Introduction to statistical pattern recognition[M]. 2nd ed. San Diego: Academic Press, 1990. 被引量：1
8Mitani Y, Hamamoto Y. A local mean-based nonparametric classifier[ J ]. Pattern Recognition Letters, 2006, 27(10): 1151-1159. 被引量：1
9Duda R O, Hart P E, Stork D G. Pattern classification [M]. 2nd ed. New York: John Wiley Sons, 2001. 被引量：1
10Jain A K, Ramaswami M D. Classifier design with Parzen windows[z]. Amsterdam: Elsevier, 1988. 被引量：1

引证文献3

1曾勇,杨煜普,赵亮.基于局部均值与类均值的近邻分类[J].控制与决策,2009,24(4):547-550. 被引量：4
2陈琴,梁家荣.基于遗传算法和发送行为的垃圾邮件检测模型[J].广西大学学报（自然科学版）,2010,35(6):1007-1010. 被引量：6
3周靖.平均互信息和类别区分性修剪规则的KNN算法[J].计算机应用,2013,33(2):558-562.

二级引证文献10

1王晓东,薛红,孙法国.基于RBF网络的房地产企业经营状况评价模型[J].纺织高校基础科学学报,2011,24(3):423-427.
2宋小天,梁家荣,李第秋,徐雪鑫.基于移动锚节点的无线传感器网络节点定位算法[J].广西大学学报（自然科学版）,2011,36(6):947-952. 被引量：2
3林冬茂.数据挖掘技术在垃圾邮件检测中的应用[J].计算机仿真,2012,29(2):120-123. 被引量：6
4赵晓丹,徐燕.垃圾邮件分类技术对比研究[J].信息网络安全,2014(2):75-80. 被引量：6
5林志伟.基于网络信息隐性挖掘技术的恐怖人员定位[J].科技通报,2014,30(9):143-146. 被引量：2
6林荫.基于KNN-SVM的垃圾邮件过滤模型[J].现代电子技术,2016,39(23):90-92. 被引量：4
7苏艳刚.一种改进自动更新的中文邮件过滤模型的设计[J].电脑知识与技术（过刊）,2013,19(7X):4706-4709.
8葛月月,曾勇,胡江平,舒欢.改进局部均值与类均值权重的近邻分类[J].计算机工程与应用,2017,53(17):137-142. 被引量：3
9石明宽,赵荣珍.基于局部质心均值最小距离鉴别投影的旋转机械故障数据降维分析研究[J].振动工程学报,2021,34(2):421-430. 被引量：6
10李玉,甄畅,石雪,朱磊.基于波段影像统计信息量加权K-means聚类的高光谱影像分类[J].控制与决策,2021,36(5):1119-1126. 被引量：4

1李斌,李义兵,何红波.基于复杂性K近邻规则的蛋白质亚细胞位点预测[J].计算机工程,2007,33(7):28-29. 被引量：1
2邱明明,吴国新.一种个性化垃圾邮件识别系统的设计[J].计算机技术与发展,2007,17(1):136-138. 被引量：4
3董源,徐雅斌,李卓,李艳平.基于社会计算和机器学习的垃圾邮件识别方法的研究[J].山东大学学报（理学版）,2013,48(7):72-78. 被引量：2
4王友卫,朱建明,李洋,凤丽洲.基于增量学习和主动学习的垃圾邮件识别新方法[J].计算机科学,2015,42(B10):23-27.
5王鑫,陈光英,段海新,李学农.基于用户反馈和增量学习的垃圾邮件识别方法[J].清华大学学报（自然科学版）,2006,46(1):70-73. 被引量：2
6梁晟.一种基于支持向量机的垃圾邮件识别方法[J].毕节学院学报（综合版）,2010,28(4):108-111.
7如何不让Gmail过滤正常邮件[J].电脑迷,2010(21):92-92.
8薛颂东,曾建潮,李临生,乔钢柱.Outlook电子邮件的远程数据库管理[J].电脑学习,2004(1):11-12.
9吕佳,邓乃扬,田英杰,邵元海,杨新民.局部学习半监督多类分类机[J].系统工程理论与实践,2013,33(3):748-754. 被引量：1
10裴继红 ,杨烜 .具有渐进局部学习特性的多色Voronoi分类器设计[J].电子与信息学报,2004,26(10):1613-1619.

计算机工程与应用

2007年第29期

浏览历史

内容加载中请稍等...

基于LZ复杂性相似度的垃圾邮件识别被引量：3

参考文献10

二级参考文献20

共引文献3

同被引文献30

引证文献3

二级引证文献10

相关作者

相关机构

相关主题

浏览历史

基于LZ复杂性相似度的垃圾邮件识别 被引量：3

参考文献10

二级参考文献20

共引文献3

同被引文献30

引证文献3

二级引证文献10

相关作者

相关机构

相关主题

浏览历史

基于LZ复杂性相似度的垃圾邮件识别被引量：3