摘要
串匹配技术是入侵检测系统中的关键技术,随着特征数量的增加,现有的自动机类匹配算法都会面对内存占用过大的问题.当特征超过一定数目后,自动机可能根本无法构造.文中提出了一种针对超大规模特征匹配(SLSPM)环境的匹配算法SLSPM.SLSPM算法借助一个块式匹配自动机和若干个普通自动机完成匹配工作,而且能够支持至少上万规模的特征集.与普通匹配自动机先读入状态再判断读入符号的方式不同,SLSPM首先使用散列函数判断当前文本块是否可以被过滤掉.如果文本块无法被过滤且为合法文本块时,再检查当前状态是否是一个能够识别当前文本块的状态.仅在当前状态吻合的情况下再读入下一个文本块进行后续匹配.理论证明显示SLSPM算法具有近似O(n)的复杂度.由于SLSPM算法未能保存全部的跳转信息,其匹配速度相对于高级AhoCorasick算法未有大幅提升.算法的优势在于,该算法在软件环境下能够维持与AC算法相同的匹配性能,而且能够将特征加载规模至少提升至上万以适应超大规模特征集匹配环境.
The current string matching algorithms nearly can not afford the burden of large memorydemand when the patters amount increases dramatically.Matching automaton can not be estab-lished at all when the amount of patterns is at least tens of thousands.We present a solution tothe problem of super large scale patterns matching (SLSPM).In our design,a matching trie isdivided into one block matching trie and many general character matching tries if possible.Duringa block matching procedure our block matching automaton (trie)does not read the current statefirst.Instead,the automaton first reads the current text block symbol and decides whether it willbe matched or not by a hash function.Then,the automaton looks for the current state in thestates set in which all the states recognize the same current text block symbol.After the currentstate is found the automaton continues to read the next text block symbol.The theoretical analysisshows that under the worst case the proposed algorithm takes O(n)time approximately,where n is the length of the text.The experiment results show that our design matches only a little fasterthan the advanced Aho-Corasick because in the advanced Aho-Corasick the entire possible transitioninformation has been stored.The advantage of SLSPMis that under software environmentSLSPMis not slower than AC during the matching procedure,and also at least tens of thousands patterns can be loaded into the hybrid automatons of SLSPMso that it can be used well for superlarge scale patters matching environment.
出处
《计算机学报》
EI
CSCD
北大核心
2014年第5期1147-1158,共12页
Chinese Journal of Computers
基金
国家"九七三"重点基础研究发展规划项目基金(2011CB302605)
国家"十一五"科技支撑计划(2012BAH37B01)
国家"八六三"高技术研究发展计划项目基金(2012AA012502
2011AA010705
2012AA012506)资助
关键词
网络安全
超大规模特征匹配
串匹配
混合自动机
算法
信息安全
network security
super large scale patterns matching
string matching
hybrid automaton
algorithm
information security