摘要
中文分词系统性能的好坏直接影响到后续的工作,而歧义字段的处理更是衡量一个分词系统好坏的重要标志。解决歧义问题前首先就要找到歧义字段,本文在之前的增字最大匹配算法基础上,提出了一种结合逐字扫描算法和逆向最大匹配算法的歧义字段识别方法。实验结果表明,这里提出的算法执行效率要比增字最大匹配算法效率高,速度更快。
The performance of Chinese word segmentation system directly influences the subsequent work,in which the ambiguity words should be recognized and processed accurately.The processing effect is a very important sign of measuring a segmentation system.In order to solve the ambiguity problem,the ambiguity words have to be found first.An algorithm combining literal scanning algorithm with reverse maximum matching algorithm is proposed on the basis of increasing maximum matching algorithm.It can be proved that the efficiency of this algorithm is better than the increasing maximum matching algorithm.
出处
《现代电子技术》
2012年第8期107-109,共3页
Modern Electronics Technique
关键词
中文分词
逆向最大匹配算法
歧义识别
算法优化
Chinese word segmentation
reverse maximum matching algorithm
ambiguity recognition
algorithm optimization