摘要
维吾尔文字符识别研究具有很高的理论价值和广阔的应用前景。该文提出一种多字体多字号印刷维吾尔文字符识别新方法:利用预分类信息将整个字符集划分为若干子集;采取两套方案,分别将输入字符归一化为32×32和24×24的点阵;提取方向线素特征,经压缩降维后,由修正二次鉴别函数完成分类,在综合可信度基础上集成识别结果;最后,利用结构的和局部的特征进行相似字鉴别。在容量为48800字符的测试集上的识别率达到99.48%,表明该方法的有效性。
A Uyghur optical character recognition method was developed for multi-font multi-size printed Uyghur characters. Initially, pre-classification information is used to divide the entire character set into several subsets with two strategies employed to recognize a character. The character is first digitized on two meshes, 32×32 points and 24×24 points with the directional line element features then extracted from the two meshes. After dimensional reduction, the feature vectors are then classified using a modified quadratic discriminant function (MQDF). The recognition results produced by the two recognition strategies are integrated based on an overall confidence value. Finally, the local and structural features are selected to discriminate between similar characters. The recognition accuracy on a test set containing 48 800 characters reached 99.48%.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2004年第7期946-949,共4页
Journal of Tsinghua University(Science and Technology)
基金
国家自然科学基金资助项目(60241005)
关键词
维吾尔文字符识别
方向线素特征
相似字鉴别
UOCR (Uyghur optical character recognition)
directional line element feature
similar character discrimination