摘要
从复杂的自然场景标志牌图像中提取和识别字符一直是数字图像处理领域的热点问题,目前的求解算法普遍存在提取文本精确度不高,提取率偏低,鲁棒性差等缺点。提出一种高效的文本提取算法,针对标志牌文本图像通常具有较复杂的自然背景等特征,首先对原始图片进行模糊化处理,然后进行Laplacian边缘提取,再对边缘图像进行非文本长边缘的删除,最后根据文本区域的特征进行边缘扫描和连通域分析实现标志牌文本的提取。通过对2003年国际自然场景文本识别竞赛(ICDAR’2003 Ro-bust Reading Competition)中大量图片测试表明,该算法对背景的复杂度、文字语言、颜色、大小字体以及排列方向具有较强的鲁棒性,同时也具有较高的准确率(Precision)和提取率(Recall)。
Sign text extraction and character recognition from natural scenes is always a hot area in the field of digital photograph.The current algorithms have some shortcomings to this problem,such as low precision,low recall,and poor robustness.In this paper,we present a new highly efficient text extraction algorithm from complex images of natural scenes.Firstly,as sign text images usually have complex natural background,the original image is vaguely processed.Then through Laplacian marginal extraction and deletion of the long brink of non-text image ,the text is extracted according to connected component analysis and edge scanning.From a large number of photographs testing in ICDAR'2003 Robust Reading Competition,this algorithm shows its robustness,accuracy and efficiency in identifying text language,color,font size and configuration from complex background.
出处
《计算机工程与应用》
CSCD
北大核心
2007年第24期246-248,共3页
Computer Engineering and Applications
基金
中国地质大学(武汉)优秀青年教师基金(No.CUGQNL0328)
关键词
自然场景文本提取连通域分析边缘扫描
natural scene
text extraction
connected component analysis: edge scan