期刊文献+

文档图像识别技术回顾与展望 被引量:6

Document Image Recognition:Retrospective and Perspective of Technology
下载PDF
导出
摘要 【目的】文档图像是一类广泛存在且具有重要应用价值的数据。从文档图像中检测文字并转化为计算机内码(电子文本)是文档识别的主要目标。自上世纪50年代以来,文档识别(又称文字识别,OCR)的研究和应用取得了巨大的进展。本文为科研人员和工程人员提供一个比较全面的文档图像识别技术总体介绍,便于大家开展技术创新和技术应用。【方法】本文在介绍文档识别应用背景的基础上,对该领域历史上主要方法进行回顾,对当前技术状况和研究动态进行分析,并展望未来发展趋势。【结果】1950年代到2000年代,在统计模式识别、特征提取、结构分析、字符切分、字符串识别和版面分析等方面积累了大量有效方法。【结论】近年来深度学习(深度神经网络)逐渐成为主导性的方法,使文字检测和识别的性能得到明显提升,但在复杂版面分析能力、文字识别的可靠性、泛化性等方面仍然存在不足。 [Objective]Document images carry important information of texts which are extensive in daily life.Extracting texts from images and converting to digital texts to be processed by computers is the main objective of document image recognition(also called as character recognition or OCR).Since 1950s,the field of document recognition has seen tremendous advances in research and applications.This paper provides an overview of document image recognition,facilitating research innovations and engineering applications.[Methods]In this article,I first introduce the applications needs of document recognition,then review the main advances of research in this field,analyze the strengths and weaknesses of the methods,and finally,prospect the future development.[Results]Numerous methods of statistical recognition,feature extraction,structural analysis,character segmentation,character string recognition and layout analysis were proposed during 1950s-2000s.[Conclusions]In recent years,deep learning methods(deep neural networks,DNNs)dominate the field,and have promoted the performance of text detection and recognition significantly.However,insufficiencies are still evident in complex layout analysis,character recognition reliability and generalization.
作者 刘成林 Liu Chenglin(National Laboratory of Pattern Recognition,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China;School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100049,China;CAS Center for Excellence of Brain Science and Intelligence Technology,Beijing 100190,China)
出处 《数据与计算发展前沿》 2019年第2期17-25,共9页 Frontiers of Data & Computing
基金 国家自然科学基金(61721004)。
关键词 文档识别 版面分析 文本检测 深度学习 字符识别 文本行识别 document recognition layout analysis text detection deep learning character recognition text line recognition
  • 相关文献

参考文献1

二级参考文献148

  • 1钱跃良,林守勋,刘群,刘洋,刘宏,谢萦.863计划中文信息处理与智能人机接口基础数据库的设计和实现[J].高技术通讯,2005,15(1):107-110. 被引量:4
  • 2Hildebrandt T H, Liu W T. Optical recognition of handwritten Chinese characters:advances since 1980. Pattern Recognition, 1993, 26(2):205-225. 被引量:1
  • 3Suen C Y, Berthod M, Mori S. Automatic recognition of handprinted characters——the state of the art. Proceedings of the IEEE, 1980, 68(4):469-487. 被引量:1
  • 4Tai J W. Some research achievements on Chinese character recognition in China. International Journal of Pattern Recognition and Artificial Intelligence, 1991, 5(01n02):199-206. 被引量:1
  • 5Liu C L, Jaeger S, Nakagawa M. Online recognition of Chinese characters:the state-of-the-art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(2):198-213. 被引量:1
  • 6Cheriet M, Kharma N, Liu C L, Suen C Y. Character Recognition Systems:a Guide for Students and Practitioners. USA:John Wiley & Sons, 2007. 被引量:1
  • 7Plamondon R, Srihari S N. Online and off-line handwriting recognition:a comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(1):63-84. 被引量:1
  • 8Dai R W, Liu C L, Xiao B H. Chinese character recognition:history, status and prospects. Frontiers of Computer Science in China, 2007, 1(2):126-136. 被引量:1
  • 9Liu C L. High accuracy handwritten Chinese character recognition using quadratic classifiers with discriminative feature extraction. In:Proceedings of the 18th International Conference on Pattern Recognition. Hong Kong, China:IEEE, 2006.942-945. 被引量:1
  • 10Long T, Jin L W. Building compact MQDF classifier for large character set recognition by subspace distribution sharing. Pattern Recognition, 2008, 41(9):2916-2925. 被引量:1

共引文献108

同被引文献66

引证文献6

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部