摘要
二值化是光学文字识别(OCR)的重要步骤,直接影响到光学文字识别的成功率。目前基于亮度分割局域二值化算法效果好,但是过程复杂、运算耗时。快速二值化算法流程简单、噪声敏感。低亮度图片一般有不可忽略的噪声,并且文字对比度低。为获取低对比度文字,快速二值化算法需对亮度梯度敏感,但是也会导致快速二值化结果文字断裂、丢失、背景噪声大。为实现高质量快速二值化,本文采取非局域均值滤波算法抑制噪声,同时避免过度平滑图片。采用改进的Bradley算法提取低对比度文字,并解决了文字断裂等问题。最后采用膨胀腐蚀算法抑制二值化噪声。本方法适用于非均匀低亮度和高亮度的图片。实验结果表明,本方法在非均匀高亮度下,表现和其他快速二值化算法相同。在非均匀低亮度下,提取文字更多、文字断裂更少、噪声更小。本方法二值化结果的OCR召回率达到了93.5%。
Binarization is an important step in optical character recognition(OCR),directly affects the accuracy of OCR.At present,the local binarization algorithms based on luminance segmentation have good effect,complicated process and long elapsed time.The fast binarization algorithms are simple and noise sensitive.Generally,low-luminance images have nonnegligible noise and low contrast of text.In order to obtain low contrast text,fast binarization algorithms need to be sensitive to luminance gradient.However,in the binarization result,luminance gradient sensitivity also leads to nonnegligible background noise,textual breakage and loss.In this paper,for high-quality and fast binarization,non-local mean filtering is adopted to suppress noise and avoid over-smooth.Improved Bradley algorithm is used to extract the low contrast text in order to solve the problem of textual breakage.In the end,dilation algorithm and erosion algorithm are used to suppress the noise of binarization.Our method is suitable for uneven low luminance pictures and uneven high luminance pictures.Experimental results show that our method performs the same as other fast binarization algorithms under uneven high luminance,however,extracts more text with less noise under uneven low luminance,solves the problem of textual breakage.The OCR recall rate of the binarization results of this method reached 93.5%.
作者
王康维
赵磊
黄鑫炎
彭玉发
马思远
范虹伯
WANG Kang-wei;ZHAO Lei;HUANG Xin-yan;PENG Yu-fa;MA Si-yuan;FAN Hong-bo(School of Applied Sciences,Harbin University of Science and Technology,Harbin,Heilongjiang Province 150000,China)
出处
《光电子.激光》
EI
CAS
CSCD
北大核心
2020年第12期1333-1340,共8页
Journal of Optoelectronics·Laser
基金
大学生创新创业训练项目(201810214035)资助项目。