摘要
自然背景中的文本识别具有巨大的应用价值,但其应用却一直受到文本检测和分割技术的限制。为了更有效地进行文本检测与分割,提出了一种基于连通分量特征的自然场景中文本检测分割算法。该算法首先将原始图片通过Niblack方法分解为许多连通分量;接着,用一个级联分类器和一个SVM组成的两阶段分类模块来验证这些连通分量的文本特征。由于文本连通分量和非文本连通分量在特征上存在差异,大多数非文本会被级联分类器丢弃,而SVM则能在此结果上做进一步的验证,因此最终输出只有文本的二值图像。最后用该算法在测试数据上进行了评估实验,评估结果表明,检测精度超过90%,响应超过93%。
Text recognition in natural scenes has a promising future, but its application is limited by the technique of text detection and segmentation. To detect and segment text effectively, this paper proposes an approach for detecting and segmenting text from scene images by using Connected-Components' features. First, the image is decomposed into a list of Connected-Components(CCs) by Niblack algorithm. Then all the CCs' features are verified by 2-stage classification module which is composed by a cascade classifier and a SVM. Most of non-text CCs are filtered out by cascade classifier and the remaining CCs are further verified by SVM. The final outputs are binary images containing texts only. Experiments have been taken on lots of images, the precision is more than 90% and recall is more than 93%.
出处
《中国图象图形学报》
CSCD
北大核心
2006年第11期1653-1656,共4页
Journal of Image and Graphics
关键词
级联分类器
两阶段分类
文本检测
文本特征
cascade classifier, 2-stage classification, text detection, text feature