摘要
研究了如何快速识别并过滤经过变异处理的中文信息的技术,并将变异规则限定在当前中文网络最常见的5种变异方法上.提出了一个快速而准确的中文信息多模式模糊匹配算法,该算法在WM算法的基础上融合了压缩编码的思想,适于实时地对网络信息进行处理.实验表明,基于该算法的信息过滤系统能够支持大量的输入模式,系统对模式的识别准确率超过了99%,并且达到了很高的执行效率.该算法在中文信息过滤领域有着广阔的应用前景.
How to recognize and filter modified specific Chinese information is researched. This paper focuses on five most commonly-used modifying rules in Chinese network, and presents an efficient multi-pattern approximate matching-algorithm. This algorithm is based on WM algorithm and suitable for processing real-time Chinese information. The idea of compressing and encoding is also introduced. Experiments show that the information filtering system based on this algorithm can support a lot of patterns with an accuracy of over 99 % and high speed. This algorithm can be widely applied to Chinese information filtering.
出处
《高技术通讯》
CAS
CSCD
北大核心
2005年第9期7-12,共6页
Chinese High Technology Letters
基金
国家高技术研究发展计划(863计划),国家自然科学基金