摘要
针对维吾尔词书写粘连和手写笔画漂移等问题,提出一种基于多信息融合路径寻优的字符分割算法。利用笔画提取、切分和聚类,过分割单词图像得到主体和附加字段,通过字段模糊匹配获得鲁棒的字根序列描述,以抑制笔画漂移造成的干扰;由建立的匹配位置高斯模型来估算字段匹配信息,经对单字分类器输出进行置信度转换,从而得到字符识别信息,再运用数据统计获取单词语义信息;由构建的字符序列二阶Markov语言模型,基于Bayes准则,提出了单词后验概率的多信息加权融合计算方法,通过字段匹配及字根合并的路径寻优,可得到最佳字符分割结果。在手写维文样本库上的实验表明,所提算法能有效提升字符分割的准确率和稳定性。
Character segmentation is a key technique for Uyghur handwriting recognition, but cursive characters and the phenomenon of stroke drift make the segmentation difficult. A new character segmentation algorithm based on multiple information fusion is proposed to solve the problem. Strokes of a word are extracted, segmented and clustered to get two types of sections: main and affix. The robust oversegmentation primitive sequences are obtained using fuzzy section matching to reduce the interference from stroke drift. Then, the matching information is estimated by constructing a matching position Gaussian model. The recognition confidence is converted from character classifier outputs by confidence transformation, and the semantic information is obtained by word data statistics. A character sequences Markov model is presented and the formula to calculate the posterior probability of a word is derived based on the Bayes criterion. The optimal path and the optimal segmentation result are achieved by weighted multiple information fusion. Experiments show that the proposed algorithm can effectively improve the accuracy and stability of character segmentation.
出处
《西安交通大学学报》
EI
CAS
CSCD
北大核心
2013年第8期68-73,86,共7页
Journal of Xi'an Jiaotong University
基金
国家自然科学基金资助项目(60872141)
中央高校基本科研业务费专项资金资助项目(K50510010007)
华为科技基金资助项目(HITC2011023)
关键词
信息处理技术
手写文字识别
字符分割
维吾尔语
多信息融合
information processing technology
handwriting recognition
character segmentation
Uyghur language
multiple information fusion