期刊文献+

一种适合于超大规模特征集的匹配方法 被引量:2

Pattern Matching for Super Large Patterns Set
下载PDF
导出
摘要 串匹配技术是入侵检测系统中的关键技术,随着特征数量的增加,现有的自动机类匹配算法都会面对内存占用过大的问题.当特征超过一定数目后,自动机可能根本无法构造.文中提出了一种针对超大规模特征匹配(SLSPM)环境的匹配算法SLSPM.SLSPM算法借助一个块式匹配自动机和若干个普通自动机完成匹配工作,而且能够支持至少上万规模的特征集.与普通匹配自动机先读入状态再判断读入符号的方式不同,SLSPM首先使用散列函数判断当前文本块是否可以被过滤掉.如果文本块无法被过滤且为合法文本块时,再检查当前状态是否是一个能够识别当前文本块的状态.仅在当前状态吻合的情况下再读入下一个文本块进行后续匹配.理论证明显示SLSPM算法具有近似O(n)的复杂度.由于SLSPM算法未能保存全部的跳转信息,其匹配速度相对于高级AhoCorasick算法未有大幅提升.算法的优势在于,该算法在软件环境下能够维持与AC算法相同的匹配性能,而且能够将特征加载规模至少提升至上万以适应超大规模特征集匹配环境. The current string matching algorithms nearly can not afford the burden of large memorydemand when the patters amount increases dramatically.Matching automaton can not be estab-lished at all when the amount of patterns is at least tens of thousands.We present a solution tothe problem of super large scale patterns matching (SLSPM).In our design,a matching trie isdivided into one block matching trie and many general character matching tries if possible.Duringa block matching procedure our block matching automaton (trie)does not read the current statefirst.Instead,the automaton first reads the current text block symbol and decides whether it willbe matched or not by a hash function.Then,the automaton looks for the current state in thestates set in which all the states recognize the same current text block symbol.After the currentstate is found the automaton continues to read the next text block symbol.The theoretical analysisshows that under the worst case the proposed algorithm takes O(n)time approximately,where n is the length of the text.The experiment results show that our design matches only a little fasterthan the advanced Aho-Corasick because in the advanced Aho-Corasick the entire possible transitioninformation has been stored.The advantage of SLSPMis that under software environmentSLSPMis not slower than AC during the matching procedure,and also at least tens of thousands patterns can be loaded into the hybrid automatons of SLSPMso that it can be used well for superlarge scale patters matching environment.
出处 《计算机学报》 EI CSCD 北大核心 2014年第5期1147-1158,共12页 Chinese Journal of Computers
基金 国家"九七三"重点基础研究发展规划项目基金(2011CB302605) 国家"十一五"科技支撑计划(2012BAH37B01) 国家"八六三"高技术研究发展计划项目基金(2012AA012502 2011AA010705 2012AA012506)资助
关键词 网络安全 超大规模特征匹配 串匹配 混合自动机 算法 信息安全 network security super large scale patterns matching string matching hybrid automaton algorithm information security
  • 相关文献

参考文献13

  • 1Navarro G, Raffinot M. Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences. Cambridge.. Cambridge University Press, 2002. 被引量:1
  • 2Aho A V, Corasick M J. Efficient string matching: An aid to bibliographic search. Communications of the ACM, 1975, 18(6) : 333-340. 被引量:1
  • 3Coit C J, Staniford S, MeAlerney J. Towards faster string matching for intrusion detection or exceeding the speed of snort//Proceedings of the DARPA Information Survivability Conference& Exposition II. Anaheim, USA, 2001:367-373. 被引量:1
  • 4Wu Sun, Manber U. A fast algorithm for multi-pattern searching. University of Arizona, Tucson: Technical Report TR-94-17, 1994. 被引量:1
  • 5Song Tian, Wang Dongsheng. A path combinational method for multiple pattern matching//Proceedings of the 5th ACM/ IEEE Symposium on Architectures for Networking and Communications Systems. Princeton, USA, 2009:76-77. 被引量:1
  • 6Piyaehon P, Luo Yan. Efficient memory utilization on network processors for deep packet inspection//Proceedings of the 2006 ACM/IEEE Symposium on Architecture for Networking and Communications Systems. San Jose, USA, 2006:71-80. 被引量:1
  • 7Dharmapurikar S, Lockwood J. Fast and scalable pattern matching for content filtering//Proceedings of the Symposium on Architectures for Networking and Communications. Princeton, USA, 2005:183-192. 被引量:1
  • 8胡玥,高庆狮,郭莉,王培凤.巨型多不确定串匹配完全自动机及其快速生成算法[J].中国科学:信息科学,2011,41(5):552-561. 被引量:2
  • 9吴碧海,赵有健.网络入侵检测中多模式匹配的状态编码方法[J].清华大学学报(自然科学版),2009(4):612-615. 被引量:3
  • 10李志东,杨武,张汝波,王巍.基于异构隐式存储的多模式匹配算法[J].通信学报,2009,30(3):119-124. 被引量:6

二级参考文献13

  • 1宋华,戴一奇.一种用于内容过滤和检测的快速多关键词识别算法[J].计算机研究与发展,2004,41(6):940-945. 被引量:22
  • 2刘萍,谭建龙.XML内容筛选中的快速串匹配算法[J].中文信息学报,2005,19(2):20-27. 被引量:3
  • 3贺龙涛,方滨兴,余翔湛.一种时间复杂度最优的精确串匹配算法[J].软件学报,2005,16(5):676-683. 被引量:25
  • 4Alfred V Aho, Margaret J Corasick. Efficient string matching: an aid to bibliographic search [J]. Communication oftheACM, 1975, 18(6): 333-340. 被引量:1
  • 5Commentz-Walter B. A string matching algorithm fast on the average [C]// ICALP 1979. Proc of the 6th Colloquium on Automata, Languages and Programming. Lodon : Springer-Verlag, 1979:118 - 132. 被引量:1
  • 6Wu S, Manber U. A fast algorithm for multi-pattern searching, TR-94-17 [R]. Arizona : Department of Computer Science, University of Arizona, 1994. 被引量:1
  • 7Tan L, Sherwood T. A high throughput string matching architecture for intrusion detection and prevention [C]// ISCA’05. Proc of the 32nd International Symposium on Computer Architecture. Washinton: IEEE Computer Society, 2005:112 - 122. 被引量:1
  • 8Brodie B C, Ron K C, Taylor D E. A scalable architecture for high throughput regular expression pattern matching [C]//ISCA’06. Proc of the 33rd International Symposium on Computer Architecture. New York: ACM, 2006:191 - 202. 被引量:1
  • 9Lunteren J V. High performance pattern matching for intrusion detection [C]//Infocom' 06. Proe of Infocom’06. Barcelona: IEEE Infocom, 2006:1-13. 被引量:1
  • 10Kumar S, Dharmapurikar S, Yu F, et al. Algorithms to accelerate multiple regular expressions matching for deep packet inspection [C]// SIGCOMM’06. Proc of SIGCOMM’06. New York: ACM, 2006: 339 - 350. 被引量:1

共引文献8

同被引文献21

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部