期刊文献+

基于LSI和SVM分类法的定题邮件过滤研究 被引量:1

Research for Intelligent and Customized Email Filtering Based on Latent Semantic Indexing and Support Vector Machine
下载PDF
导出
摘要 潜在语义索引(LSI)是一种有效的信息查询方法,同时也被成功地应用到了文本分类中。LSI能解决同义和多义的问题,通过降低原始文档-术语矩阵的噪声来凸现出词条和文档之间的语义关系。为了识别和过滤有害的、不期望的定题的信息或Email,在双语言环境下(包括中文和英文),提出了一个基于改进的LSI方法的定题邮件类信息过滤系统,该系统采用潜在语义模型来表示被过滤的信息类,通过奇异值分解和正例监护学习方法,选择支持向量机(SVM)来识别和分类预定义的定题信息。实验结果表明:基于LSI的特征选择的SVM分类算法是一种更有效的信息识别和文本分类方法,不但具有较好的分类性能,同时也能大大减小计算的复杂性。 Latent Semantic lndexing(LSl) is an effective method for Information Retrieval(IR),and it also has been successfully applied to text classification.LSI can resolve the problems of polysemy and synonymy,and make the semantic relation between document and term turn more obvious through reducing noise in the raw document-term matrix.In this paper,in order to prevent and filter the unsolicited emails and harmful messages,under multi-languages (Chinese and English) circumstance an improving LSI approach was proposed for customized Email filtering system,Latent Semantic Model was applied to represent the predefined and filtered information categories,Support Vector Machine(SVM) algorithm was chosen to recognize and classify predefined and customized unsolicited and harmful information through Singular Value Decomposition (SVD) and positive examples supervised learning.The results of the experiment show that the approach based on LSI and SVM is a more effective approach to information identifying,it not only has a good filtering performance but also can greatly reduce the complexity of computation.
作者 杨清 李方敏
出处 《计算机工程与应用》 CSCD 北大核心 2006年第35期168-171,共4页 Computer Engineering and Applications
基金 湖南省自然科学基金资助项目(06JJ50132) 湖南省杰出青年基金项目(03JJY1012)。
关键词 支持向量机 潜在语义索引 信息查询 监护学习 文本分类 Support Vector Machine (SV M ) Latent Semantic Indexing (LSI) Information Rctrieval (IR) supervised learning text classification
  • 相关文献

参考文献21

  • 1LIU Tao,CHEN Zheng,ZHANG Ben-yu,et al.Improving text classification using local latent semantic indexing[C]//proc of ICDM 2004,2004:162-169. 被引量:1
  • 2MEHRAN S,SUSAN D,DAVID H,et al.A Bayesian approach to filtering junk.e-mail[C]//proc of AAAI-98 Workshop on Learning for Text Categorization,1998:55-62. 被引量:1
  • 3KARL-MICHAEL.Learning to filter junk e-mail from positive and unlabeled examples[C]//proc of IJCNLP-04,2004:602-607. 被引量:1
  • 4HUANG Yan.Support Vector Machines for text categorization based on Latent Semantic Indexing[C]//proc of KDD'04,August 2004. 被引量:1
  • 5LEWIN D D,RINGUUETTE M.A comparison of two learning algorithms for text categorization[C]//proc of the Third Annual Symposium on Document Analysis and Information Retrieval,1994:81-93. 被引量:1
  • 6WIENER E,PEDERSEN J O,WEIGEND A S.A neural network approach to topic spotting[C]//proc of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR'95),1995:317-332. 被引量:1
  • 7SCHUTZW H,HULL D,PEDERSEN J O.A comparison of classifiers and document representations for the routing problem[C]//proc of the 18th International ACM SIGIR Conference on Research and Development in Information Retrieval,1995:229-237. 被引量:1
  • 8APTE C,DAMERAU F,WEISS S.Automated learning of decision rules for text categorization[J].ACM Transactions on Information System,1994,12 (3):233-251. 被引量:1
  • 9JOACHINES T.Text categorization with support vector machines:learning with many relevant features[C]//proc of the 10th Eurospeech Conference on Machine Learning(ECML),1998:137-142. 被引量:1
  • 10YANG Y.Noise reduction in a statistical approach to text categorization[C]//proc of SIGIR1995,1995:256-263. 被引量:1

二级参考文献6

  • 1[1]Forrest S, Perrelason AS, Allen L, Cherukur R. Self_Nonself discrimination in a computer. In: Rushby J, Meadows C, eds. Proceedings of the 1994 IEEE Symposium on Research in Security and Privacy. Oakland, CA: IEEE Computer Society Press, 1994. 202~212. 被引量:1
  • 2[2]Ghosh AK, Michael C, Schatz M. A real-time intrusion detection system based on learning program behavior. In: Debar H, Wu SF, eds. Recent Advances in Intrusion Detection (RAID 2000). Toulouse: Spinger-Verlag, 2000. 93~109. 被引量:1
  • 3[3]Lee W, Stolfo SJ. A data mining framework for building intrusion detection model. In: Gong L, Reiter MK, eds. Proceedings of the 1999 IEEE Symposium on Security and Privacy. Oakland, CA: IEEE Computer Society Press, 1999. 120~132. 被引量:1
  • 4[4]Vapnik VN. The Nature of Statistical Learning Theory. New York: Spring-Verlag, 1995. 被引量:1
  • 5[5]Lee W, Dong X. Information-Theoretic measures for anomaly detection. In: Needham R, Abadi M, eds. Proceedings of the 2001 IEEE Symposium on Security and Privacy. Oakland, CA: IEEE Computer Society Press, 2001. 130~143. 被引量:1
  • 6[6]Warrender C, Forresr S, Pearlmutter B. Detecting intrusions using system calls: Alternative data models. In: Gong L, Reiter MK, eds. Proceedings of the 1999 IEEE Symposium on Security and Privacy. Oakland, CA: IEEE Computer Society Press, 1999. 133~145. 被引量:1

共引文献134

同被引文献38

引证文献1

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部