摘要
在维吾尔文文字识别中,能否有效地聚类将直接影响识别结果的好坏。为改善聚类效果,针对维吾尔文连体段聚类,提出了一种改进的K-means聚类算法。该算法首先采用等间距法多次选择类中心,然后选择最佳码本和利用有效相似比来动态调整聚类个数K,最后完成了连体段聚类。实验结果表明:与传统K-means算法相比,改进的K-means算法得到了较好聚类效果,聚类正确率达90%以上。
In Uyghur character recognition, the effect of the cluster will affect the recognition rate directly. To improve the clustering result, an improved K-means clustering algorithm based on Uyghur word-part is presented. The first step of the method is to select the center of the clustering by using the equal interval method repeatedly in order to select the best codebook, then adjust the number of clustering classes(noted as K)by using an effective similarity ratio dynamically. Finally, the word-part clustering is completed. The experimental results show that:compared with the traditional K-means algorithm, the improved K-means algorithm gets a better result and the clustering accuracy is more than 90%.
出处
《计算机工程与应用》
CSCD
2014年第14期135-138,254,共5页
Computer Engineering and Applications
基金
国家自然科学基金(No.61032008
No.61163031
No.60863009)
关键词
维吾尔文文字识别
连体段
聚类算法
等间距法
有效相似比
正确率
Uyghur character recognition
word-part
clustering algorithm
equal interval method
effective similarity ratio
accuracy