摘要
为解决图像像素表示汉字特征方法不能有效表示汉字本质特征、空间复杂度较高的问题,提出了一种汉字图特征提取方法。方法主要包含汉字图像二值化,汉字图像骨架提取,汉字图特征提取3个部分;二值化消除图像中的噪声,提高图特征提取的准确度;骨架提取保留图像中重要的像素点,剔除无关的像素点;图特征提取将汉字关键点与图数据结构结合来表示汉字形状特征。在3 908个常用汉字的5种字体上进行实验。结果表明,该方法能够正确提取笔画复杂汉字的图特征,有效表示汉字本质特征;不同字体汉字图特征相同的汉字数量最高为3 195个,方法表现较稳定;平均每个汉字可以用22.6个图节点、19.1个边表示,相较于用单通道图像表示汉字特征,可大幅降低空间复杂度。
In order to solve the problem that the method of representing Chinese character features by image pixels cannot effectively represent the essential features of Chinese characters and has high space complexity,a feature extraction method for Chinese character images was proposed.The method mainly includes three parts:binarization of Chinese character image,skeleton extraction of Chinese character image,and feature extraction of Chinese character image.Binarization eliminates noise in the image and improves the accuracy of image feature extraction.Skeleton extraction retains important pixels in the image,eliminates Irrelevant pixels.Graph feature extraction combines Chinese character key points with graph data structures to represent Chinese character shape features.Experiments were carried out on five fonts of 3908 commonly used Chinese characters.The results show that the method can correctly extract the graph features of Chinese characters with complex strokes and effectively represent the essential features of Chinese characters.The maximum number of Chinese characters with the same graph features of different fonts is 3195,and the performance of the method is relatively stable.An average of 22.6 graph nodes can be used for each Chinese character,19.1 edge representations,compared to using single-channel images to represent Chinese character features,can greatly reduce the space complexity.
作者
唐善成
梁少君
戴风华
来坤
曹瑶倩
TANG Shan-cheng;LIANG Shao-jun;DAI Feng-hua;LAI Kun;CAO Yao-qian(Communication and Information Engineering,Xi'an University of Science and Technology,Xi'an 710054,China;CCCC Second Highway Engineering Bureau Co.,Ltd.,Xi'an 710054,China)
出处
《科学技术与工程》
北大核心
2024年第2期658-664,共7页
Science Technology and Engineering
基金
国家重点研发计划(2018YFC0808300)
陕西省科技计划重点产业创新链(群)项目(2020ZDLGY15-07)
西安市科技计划科技创新引导项目(201805036YD14CG20(4))。
关键词
汉字识别
图特征
图数据结构
Chinese character recognition
graph features
graph data structure