摘要
本文首先将文本信息检索中LSI方法的思想和原理应用于手写数字识别问题,把手写数字图像看作空间向量的表示,通过计算未知数字与各训练集之间相关度排序来达到识别的目的,计算量小且有较低的误识率(5.5%);其次,通过对所有0-9数字的训练样本排列为一个矩阵,并对该矩阵进行奇异值分解,将各训练样本在适当维数的左奇异向量上分别投影,得到了一种低阶表示下的相关度计算方法,该方法在保持原有较低误识率的同时,能极大地压缩原有训练样本数据(压缩掉的数据百分比超过95%);另外,利用了区分不规范样本的思想,获得了更低的误识率(下降到4.5%)。
By using the LSI(Latent Semantic Indexing) method of information retrieval in the hand- written digit classification problem, we obtain the right recognition with small computing cost and low recognition error rate (5.5%) through computing the rank of the similarities of the unknown digit vector with different training sets. Then, by making singular value decomposition on the matrix obtained by putting all the 0-9 digits training sets together, we propose an improved low order representation method based on the projection on the left singular vectors having suitable dimensions, and the method can greatly reduce the training set data (where the data reduction is more than 95%) and keep the low recog- nition error rate. Additionally, according to the differences between well-written digits and worse-writ- ten digits, we reduce the recognition error rate further (down to 4.5%).
出处
《计算机工程与科学》
CSCD
北大核心
2012年第6期106-110,共5页
Computer Engineering & Science
基金
国家自然科学基金资助项目(NSFC11071192)
国家科技部国际合作项目(2010DFA14700)
陕西省自然科学基础研究计划(SJ08E226)
中央高校基本科研业务费项目(XJJ20100107)
关键词
手写数字识别
LSI
奇异值分解
handwritten digit classification
LSI
singular value decomposition