摘要
汉字图像不仅包含了汉字的字符信息 ,还包含了汉字的字体信息 .字体信息是版面分析、理解和恢复的重要依据 ,还有助于实现高性能字符识别系统 .目前的字体识别方法还不能对单个汉字字符的字体进行识别 .本文提出了一种新的字体识别方法 ,能够在不知道汉字字符的前提下 ,识别单个汉字的字体 .首先对单个汉字的字符图像进行小波分解 ,在变换图像上提取小波特征 .提取的小波特征经Box Cox变换整形后 ,用线性鉴别分析技术 (LDA)进行特征选择 ,得到字体识别特征 .所使用的分类器是MQDF分类器 .在包含 7种字体的样本集上进行的实验表明 ,本文提出的方法能够在不知道汉字字符的前提下 ,对单个汉字的字体进行有效识别 ,基于单字的字体识别率达到 97.35 % .
Printed character image contains the information of characters and the information of fonts. Font information is essential in layout analysis and reconstruction, and is helpful to improve the performance of character recognition system. An algorithm for font recognition of single Chinese character is proposed, which needs no prior knowledge of characters. The new algorithm can recognize the font of a single Chinese character while existing methods are all based on a block of text. We extracted wavelet feature from a single character image and employed Box-Cox transformation and LDA technique to get the final feature for font recognition, which was used by a MQDF classifier. Experiment shows that our method can recognize the font of a single Chinese character effectively and a recognition rate of 97.35% is achieved.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2004年第2期177-180,共4页
Acta Electronica Sinica
关键词
字体识别
单字符
小波特征
LDA
MQDF
Classification (of information)
Feature extraction
Image analysis
Image reconstruction
Natural language processing systems
Wavelet transforms