摘要
本文介绍了对基于单元合并的汉字切分算法作出的改进。该改进算法对原算法中的核心部分高级合并部分进行了修改,通过在所有的可合并单元中找最佳合并组合,来避免原来的算法在高级合并过程中可能导致的某些合并错误。经过多个实际样本的测试,所作的改进在不降低原算法各种性能的前提下,消除了原算法在某些情况下产生的错误。
This paper introduces the modification of the Chinese character segmentation method based on units amalgamation. This modified method alters the advanced amalgamation part which is the core of the original method. Because the modified method looks for the best amalgamating combination from all the units which can be amalgamated, it can avoid some amalgamation errors which will be caused by the advanced amalgamation in the original method. By many tests on actual samples, the modification does not decline the performance of the original method, and it removes some errors and effectively improves the segmentation correct rate.
出处
《中文信息学报》
CSCD
北大核心
1999年第2期33-39,共7页
Journal of Chinese Information Processing
基金
自然科学基金
国家"八六三"高科技项目
关键词
单元合并
高级合并
汉字切分算法
汉字识别系统
units amalgamation segmentation method advanced amalgamation the best amalgamating combination