摘要
藏字内码向ISO/IEC 10646-1藏文编码过渡是实现计算机用文字编码统一的必然趋势,但目前在很长的一段时间内仍将存在多种藏字编码并存的情况,所以实现藏字内码的自动识别是保证藏字多内码并存的关键。主要探讨了如何在多内码并存的多编码环境中实现藏字内码自动识别的问题,并提供了两个藏字内码识别算法。在此基础上,对不同的识别算法进行分析和评估。在对目标样本的测试中,以上算法的识别率最高可以达到100%以上。
It s a general tendency that the tibetan Character Internal Codes used in computer should transfer to ISO/IEC 10646-1,but there are multi-tibetan Character Internal Codes used in the computer now,and this instance will stand a long time.So how to realize the tibetan Character Internal Codes auto recognition is the key to build a Multi-coded Environment.This paper mainly discusses the tibetan Character Internal Codes recognition algorithms in the Multilingual Environment,and provides two recognition algorithms, such as Internal Code Bound Recognition Algorithm, Interpunction Recognition Algorithm, tibetan Character Frequency Recognition Algorithm and Semantic Recognition Algorithm. This paper also evaluates the algorithms mentioned in this paper, and the rate of Recognition can reach 100% used these recognition algorithms on the test documents.
出处
《微处理机》
2009年第5期69-71,共3页
Microprocessors
关键词
计算机应用
藏文信息处理
多编码环境
藏字内码
识别算法
Computer application
Tibtan information processing
Multi-coded environment
Tibetan character internal code
Recognition algorith