摘要
借助于计算机将大量规则的文档碎片重建修复,可以极大地提高工作效率,降低人工成本,因此该方面的工作受到学术界的普遍关注。目前,形状规则的英文碎片匹配主要面临3个方面的问题:1)碎片特征提取困难;2)拼接效率低;3)拼接精确度低。针对问题一,通过一系列数据统计处理,排除英文字母高低不一的干扰因素,提取每行字符的标准像素高度作为碎片的特征向量;针对问题二,通过建立优化模型,在保证每类碎片个数相同的前提下,使用蚁群算法进行横向快速聚类;针对问题三,通过对字符8邻域内的像素灰度值进行统计,建立两幅碎片的距离函数,并通过蚁群算法进行匹配及精确聚类。最后,以2013年全国高教杯数学建模的B题附件5的碎片为实验对象,验证该方法的可行性和有效性。
With the help of a computer to rejoin a large number of regular document image fragments,which can greatly improve the efficiency of work and reduce the labor costs.Therefore,it has been paid more and more attention by the academic community.At present,there are three main problems in the matching of English fragments with shape rules,one is the difficulty of fragment feature extraction,the other is the low efficiency of splicing,and the third is the low accuracy of splicing.For the first problem,a series of data statistics is adopted to eliminate the interference factors of the high and low English letters in this paper.For the second problem,ensuring that the number of each type of debris is the same,this paper establishes optimization model and uses Ant Colony algorithm to horizontal fast clustering.For the third problem,this paper sets up the distance function for two pieces by counting character pixel gray values of 8 neighborhoods,and then the ant colony algorithm is used for matching and accurate clustering.Finally,we take the 2013 National higher Education Cup mathematical modeling B as an example to verify the feasibility and effectiveness of the Ant Colony Algorithm.
作者
田献珍
孙立强
田振中
TIAN Xian-zhen;SUN Li-qiang;TIAN Zhen-zhong(Lushan College of Guangxi University of Science and Technology,Liuzhou,Guangxi 545000,China;Guangxi University of Science and Technology,Liuzhou,Guangxi 545000,China)
出处
《计算机科学》
CSCD
北大核心
2020年第S02期231-235,共5页
Computer Science
基金
广西高等教育本科教学改革工程重点项目(2018JGZ160)。
关键词
英文碎纸片
特征向量
蚁群聚类算法
优化模型
距离函数
English fragments
Feature vector
Ant colony algorithm
Optimization model
Distance function