摘要
目前互联网中以文本存在的数据非常庞大,针对在如此庞大的文本中如何准确、快速地找到多个不同的目标字符串的问题,在介绍常见的模式匹配算法的优点和缺点基础上,结合Trie速多模式匹配算法。根据对比性实验的结果分析得出,改进AC且匹配速度大约是AC算法的5倍。
There exists a large amount of text data on the Internet currently.In allusion to the problem that how to search out multiple different target character strings accurately and quickly in such large text,an improved fast multi-pattern matching algorithm is proposed on the basis of introducing the advantages and disadvantages of common pattern matching algorithms,and combining with the idea of converting the Trie tree into the double array form.A comparison experiment was carried out.The analysis results show that the improved AC algorithm can successfully match all the to-be queried pattern strings in the text,and its matching speed is about 5 times of that of the AC algorithm,which shows that the improved AC algorithm has good effects in aspects of matching speed,recall ratio and space utilization rate.
作者
陈永杰
吾守尔.斯拉木
于清
CHEN Yongjie;Wushour Silamu;YU Qing(School of Information Science and Engineering,Xinjiang University,Urumqi 830046,China)
出处
《现代电子技术》
北大核心
2019年第4期89-93,共5页
Modern Electronics Technique
基金
国家"973"重点基础研究计划(2014CB340506)~~
关键词
字符串匹配
多模式匹配
TRIE树
双数组
AC算法
匹配速度
character string matching
multi-pattern matching
Trie tree
double array
AC algorithm
matching speed