摘要
该文提出了一种基于有向单连通链的表格框线检测算法,能够合理地利用单连通链边沿的全局统计特性和单连通链之间的局部位置关系,精确地提取表格框线,具有抗倾斜,抗断裂,抗字线交叠等优点。在此基础上,提出了一种能够分离交叠字线的表格框线去除算法,并成功应用于实际的表格识别系统中。
A new frame line detection algorithm based on the structural image element-Directional Single-Connected Chain (DSCC) is proposed. Taking advantages of the global statistical property of the edges of the DSCCs, and their local mutual relations, the algorithm is able to accurately extract frame lines from scanned form images. It demonstrates the desired performance of insensitive to line slant, breaks as well as touches from character strokes inside the form cells. Based on this algorithm, a frame line removal approach is presented, by which the frame line can be removed without affecting the touched character strokes.
出处
《电子与信息学报》
EI
CSCD
北大核心
2002年第9期1190-1196,共7页
Journal of Electronics & Information Technology
基金
国家863计划
国家自然科学基金