农业文献知识获取中斜体字符识别技术的应用研究被引量：2

Research on detection method of English italic characters in agricultural knowledge acquisition

下载PDF

导出

摘要传统的光学字符识别(OCR)系统中,由于训练的样本中并没有包括斜体字符,导致系统无法正确识别出斜体字符,这对农业文献的知识获取造成了一定的影响。针对这个问题,提出了一种斜体字符检测和纠正的方法。首先将文本行分割成单词,并进一步细分为单个字符,然后分别检测各个字符的形态特征,并依此判断出单词的形态,最后收集检测为斜体结果的所有单词,并利用这些单词计算出斜体字符的准确角度并加以纠正。经农业文献知识获取的实践结果证明,该方法能取得很好的检测和纠正效果。 In the optical character recognition （OCR） system, due to the training sample does not include italic characters, the system cannot correctly identify the italic characters, which impacts on knowledge acquisition of agricultural literature. If the italic character were con- tained in the training sample, the complexity of the sample will be increased and also will have some impact in the recognition of positive body. For this phenomenon, this paper presents a method to detect and correct the English italics. The first step is to split lines of text into words, and further to subdivide the words into individual characters, and then detect the mor- phological characteristics of each character and so determine the word shape. Furthermore, collect the test results of all the words in italics, and use these words to calculate the italic characters＇ accurate angle and correct. The results of knowledge acquisition of agricultural lit- erature show that this method can achieve good detection and correction results.

作者金花朱亚涛靳志强

机构地区河北农业大学

出处《河北农业大学学报》 CAS CSCD 北大核心 2015年第6期124-128,共5页 Journal of Hebei Agricultural University

基金河北省高等学校科学技术研究青年基金(Z2012142) 保定市科学技术研究与发展指导计划项目(13ZN025 13ZF098) 保定市科学技术协会自然科学课题(KX2013A20) 河北农业大学理工基金项目(LG20120604)资助

关键词 OCR 斜体检测斜体校正农业知识获取 OCR italic detection italic correction agricultural knowledge acquisition

分类号 TP39 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献12

1Ding Yimei, ()kada M, Kimura F, et al. Application of Slant Correction to Handwritten Japanese AddressRecognition[A]// Proc of the 6th International Con {erenee on Document Analysis and Recognition, Seat tle, USAACM, 2001:670 - 674. 被引量：1
2Ding Yimei, Kimura F, Miyake Y, et al. Slant Esti- mation for Handwritten Words by Directionally Re- fined Chain Code[A] // Proe of the 7th International Workshop on Frontiers in Handwritten Recognition, Amsterdam, Netherlands.. ACM, 2000 53 - 62. 被引量：1
3Ding Yimei, Ohyama W, Kimura F, et al. Local Slant Estimation for Handwritten English Words[A] // Proc of the 9th International Workshop on Fron- tiers in Handwritten Recognition. Kokubunji, Ja- pan..ACM, 2004;328 - 333. 被引量：1
4Simoncini L, Kovaes Z M. A System for Reading USA Census'90 Hand-Written Fields[A] // Proc of the 3rd International Conference on Document Anal- ysis and Recognition, Montreal, Canada: IEEE, 1995 .. 86 - 91. 被引量：1
5Nicchiotti G, Seagliola C. Generalized Projeetions, A Tool for Cursive Character Normalization [A]//Proe of the 5th International Conference on Document A- nalysis and Recognition, Bangalore, India.. IEEE, 1999729 - 733. 被引量：1
6Kavallieratou E, Fakotakis N, Kokkinakis G. Slant Estimation Algorithm for OCR System[J]. PatternRecognition, 200J, 34(12): 2515 - 2522. 被引量：1
7Li Yun, Naoi S, Cheriet M, et al. A Segmentation Method for Touching Italic Characters [A]/ Proe of the 17th International Conference on Pattern Recogni- tion, Cambridge, UK:IEEE, 2004:594-597. 被引量：1
8Sun Changming, Si Deyi. Skew and Slant Correction for Document Images Using Gradient Direction[A]// Proe of the 4th International Conference on Document Analysis and Recognition, Ulm. Germany: IEEE, 1997: 170-174. 被引量：1
9Shi Na,Pan Jinxiao. Fast and Robust Skew Detection for Scanned Documents[A] International Confer- ence on Electronic and Mechanical Engineering and Information Technology (EMEIT) , Harbin Universi- ty of Science and Technolog: IEEE, 2011: 4170 - 4173. 被引量：1
10马驰,于淼.基于主曲线算法的手写字符特征分析与提取[J].计算机工程与应用,2013,49(3):202-206. 被引量：4

二级参考文献7

1张红云,苗夺谦,张东星.基于主曲线的脱机手写数字结构特征分析及选取[J].计算机研究与发展,2005,42(8):1344-1349. 被引量：10
2Delicado P.Another look at principal curves and surfaces[].Journal of Multivariate Analysis.2001 被引量：1
3Kegl B,Krzyzak A.Piecewise linear skeletonization using principal curves[].IEEE Transactions on Pattern Analysis and Machine Intelligence.2002 被引量：1
4Zhang TY,Suen CY.A fast parallel algorithm for thinning digital patterns[].Communications of the ACM.1984 被引量：1
5B. Kegl,A. Krzyzak,T. Linder, et al.A polygonal Line algorithm for constructing principal curves[].Proceedings of Neural Information Processing System.1999 被引量：1
6吴德,张红云,苗夺谦,高迎.基于复杂形态数据的主曲线提取算法及其在图像骨架提取中的应用[J].小型微型计算机系统,2010,31(4):766-769. 被引量：2
7张军平,王珏.主曲线研究综述[J].计算机学报,2003,26(2):129-146. 被引量：62

共引文献3

1李学成,段田东,徐文艳,吴素琴.基于软K段主曲线的信号细微特征识别[J].计算机应用与软件,2015,32(5):198-202. 被引量：2
2王欢欢,张涛.基于高阶谱谱骨架的信号细微特征识别[J].计算机应用与软件,2017,34(8):179-184. 被引量：1
3徐傲,彭程.基于栈式自动编码机的选票手写字符识别算法[J].计算机应用,2017,37(A02):183-185. 被引量：2

同被引文献36

1李宏乔,樊孝忠.汉语文本中特殊符号串的自动识别技术[J].计算机工程,2004,30(12):114-115. 被引量：2
2么枕生.用于数值分类的聚类分析[J].海洋湖沼通报,1994(2):1-12. 被引量：34
3刘勇,孙中海,刘德春,吴波,江东.部分柚类品种数值分类研究[J].果树学报,2006,23(1):35-40. 被引量：26
4陈晓琴,陈强,张世熔,赵芯,赵珂,吴翔.流沙河流域土壤自生固氮菌数值分类及BOX-PCR研究[J].农业环境科学学报,2006,25(B09):528-532. 被引量：14
5罗礼溥,郭宪国.云南医学革螨数值分类研究(英文)[J].热带医学杂志,2007,7(1):7-10. 被引量：4
6罗俊,王清丽,张华,林彦铨,陈由强.不同甘蔗基因型光合特性的数值分类[J].应用与环境生物学报,2007,13(4):461-465. 被引量：13
7张文静,梁颖红.术语抽取技术研究[J].信息技术,2008,32(3):6-9. 被引量：10
8罗彦彦,黄德根.基于CRFs边缘概率的中文分词[J].中文信息学报,2009,23(5):3-8. 被引量：19
9陈志雄,曾辉.中文专利文献自动分类[J].嘉应学院学报,2010,28(2):24-29. 被引量：2
10罗朝晖,赵亚冬,李建龙.Agilent GC/MS与PE自动热脱附进样器的合理配置[J].福建分析测试,2010,19(3):30-31. 被引量：1

引证文献2

1王雪颖,王昊,张紫玄.中文专利文献中连续符号串的语义识别[J].数据分析与知识发现,2018,2(5):11-22. 被引量：1
2王银果.基于硬盘RHO磁头字符识别的OCR视觉系统优化设计研究[J].电子元器件与信息技术,2020,4(5):102-104.

二级引证文献1

1肖悦珺,李红莲,张乐,吕学强,游新冬.特征融合的中文专利文本分类方法研究[J].数据分析与知识发现,2022,6(4):49-59. 被引量：8

1李景,金花,刘金刚.一种基于英文字符的斜体检测方法[J].计算机应用与软件,2015,32(3):192-195.
2斜体外文字母主要用于以下场合[J].东北电力技术,2014,35(1):62-62.
3谭立湘.Web页面描述语言HTML(二)[J].微型机与应用,1998,17(9):34-38.
4张沛,任鹏.韩风新视觉[J].人像摄影,2016,0(5):64-65.
5范志敏,赵长林.四种策略实现更高效网络管理[J].网管员世界,2010(22):28-29.
6李宗红.在CorelDRAW中设置斜体希腊字母的方法[J].宝鸡文理学院学报（自然科学版）,2014,34(4):69-72. 被引量：2
7王星平.利用FoxPro开发农业文献光盘数据库CAB[J].农业科技情报（西南农学院）,1996(2):11-14.
8禾子.量的名称及表示(2)[J].合肥学院学报（自然科学版）,2005,15(4):5-5.
9参考文献中责任者的著录[J].航天器环境工程,2011,28(3):245-245.
10陈贞.通过检测和纠正误识的字符来提高字符识别的准确性[J].图象识别与自动化,1995(2):38-45.

河北农业大学学报

2015年第6期

浏览历史

内容加载中请稍等...

农业文献知识获取中斜体字符识别技术的应用研究被引量：2

参考文献12

二级参考文献7

共引文献3

同被引文献36

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

农业文献知识获取中斜体字符识别技术的应用研究 被引量：2

参考文献12

二级参考文献7

共引文献3

同被引文献36

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

农业文献知识获取中斜体字符识别技术的应用研究被引量：2