摘要
在构造面向应用的正则表达式(RE)过程中,引入有益二义性可简化 RE 构造,而将有害二义性遗留在 RE中会危害匹配结果的正确性.为区别对待这两种二义性,基于与或树提出一种检查和定位 RE 中有害二义性的算法.该算法可减轻 RE 调试的工作量.实验表明,该算法在时间性能、空间性能和实用性等方面优于现有基于自动机的二义性检查算法.基于此算法的可视化 RE 编辑调试环境已用于构建国内第一个整合的生物数据仓库.
During the construction of regular expressions (REs) for applications, introducing some beneficial ambiguities may simplify RE construction, while leaving some pestilent ambiguities in the RE will harm the correctness of matching. In order to treat these two categories of ambiguities in different ways, an algorithm based on AND/OR tree that checks and locates the pestilent ambiguities in REs is proposed. The algorithm is helpful to reducing the workload of debugging REs. Experiments show that the algorithm outperforms the present ambiguity-checking algorithm based on automaton not only in time and space behaviors, but also in practicality. A visualized RE editing and debugging environment based on the algorithm has been applied to build the first online integrated biological data warehouse of China.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2006年第2期173-178,共6页
Pattern Recognition and Artificial Intelligence
基金
国家863高科技研究发展计划项目(No.2002AA231011)
上海市重大科技项目(No.02DJ14013)资助