摘要
提出了一种基于改进纹理谱的文本页面分割算法,该算法首先采用改进的递归投影轮廓切割算法对文本图像页面进行粗分割,并提取文本图像的纹理谱特征;然后采用最小距离法将相邻纹理单元进行分类;最后实现文本图像页面文字区与非文字区的精确分割.实验表明,提出的方法在含有文字、图、表格的文本图像页面分割中效果很好,对其他复杂文本图像页面分割也具有适应性.
A page segmentation algorithm was proposed based on improved texture spectrum. Firstly, the algorithm used the improved recursive projection profile cutting algorithm to segment a document image, and it calculated texture spectrum features of small image windows via the texture unit. Then, it classified adjacent windows by minimum distance, thereby accomplishing the segmentation of text and non-text regions for document images. Experiments show that the proposed method has good adaptability for characters, pictures and charts.
基金
安徽省教育厅自然科学基金重点项目(KJ2009A054,KJ2007A076)资助
关键词
文本图像
图像分割
纹理谱
document image
image segmentation
texture spectrum