摘要
为提高文档图像字符的可读性和切分与识别的准确率,对印刷体维吾尔文文档图像进行研究,尤其是对连通段切分和字符切分等难点问题提出分割方法。使用跑长码的连通区域算法,结合重叠度计算方法,进行连通段切分;基于维吾尔文字符在基线上相接的特点,在基线位置估计的基础上,找出字符的切点。切分结果表明,该算法比其它算法切分结果效果更好。
To improve the accuracy of Uighur character recognition,a study was carried out on the Uighur printed document image,especially for the connected component segmentation and character segmentation which have now become the most difficult problems,an efficient segmentation method was proposed.Long run code connected regions algorithm and overlapping calculation methods were combined to segment the words on the document image.Uighur characters were connected in the base-line.Based on this characteristic,the baseline of the each word was estimated,and the segmentation positions of characters were found out.The results indicate better segmentation results than the algorithm proposed earlier.
出处
《计算机工程与设计》
北大核心
2016年第7期1892-1897,共6页
Computer Engineering and Design
基金
新疆维吾尔自治区少数民族科技人才特殊培养计划科研基金项目(201323121)
新疆维吾尔自治区高校科研计划重点基金项目(XJEDU2013I11)
模式识别国家重点实验室2014年度开放课题基金项目(201306321)
关键词
文档图像处理
跑长码连通区域算法
重叠度算法
单词切分
字符切分
document image processing
running long code connected regions algorithm
overlapping algorithm
word segmenta-tion
character segmentation