摘要
为了解决传统文字识别技术难以提取票据中结构化信息的问题,本文提出了一种通用的票据结构化信息提取技术方法。该方法首先基于透视变换实现票据的模板匹配,再根据预设的关键词坐标,提取目标区域内的文字,继而构建结构化信息。实验证明,该方法具有较强的鲁棒性,同时还具备较强的迁移能力,可以针对不同类型的票据快速开发应用。本文中用于实验的票据包含身份证、动车票、增值税发票。
In order to solve the problem that it is difficult to extract the structured information in the bills by the unified text recognition technology. In this paper, a generalized technical solution for extracting structured information from bills is proposed. The method first realizes the template matching of bills based on perspective transformation, then extracts the text in the target area according to the preset keyword coordinates, and then constructs the structured information. The experiments prove that the method has strong robustness and also strong migration capability, which can be quickly developed for different types of tickets for applications. The tickets used for experiments in this paper contain ID cards, train tickets, and VAT invoices.
作者
陈翔
CHEN Xiang(School of Information Technology Engineering,Fuzhou Polytechnic,Fuzhou,China,350108)
出处
《福建电脑》
2022年第11期63-66,共4页
Journal of Fujian Computer
关键词
文字识别
透视变换
票据识别
Text Recognition
Perspective Transformation
Bill Identification