摘要
排序是语言文字信息处理中的重要工作之一,它的目的是将单词(或词组)的任意序列重新排列成按关键字有序的序列,从而优化存储结构,提高检索速度.但由于蒙古文排序习惯和其"UCS"编码的特殊性,单词排序不能完全依靠字符编码的自然顺序.单词的排列顺序不仅与字符编码有关,而且与字符状态有着密切的关系.因此先给出字符状态和词状态的定义,然后提出一种基于Mealy机的字符串排序算法.该算法是传统的字符串排序算法的一种扩展,主要解决了编码相同状态不同情况下词的排序问题,并且在传统蒙古文排序中的应用证实了算法的有效性.该算法具有很好的适应性,能够解决其它一些复杂文本语言的排序问题.
Word-sorting plays a very important role in language information processing.It is the main task for word-sorting to reorder the random array of words (or the phrases) into a key word array,so that it can optimize the storage structures of word-list and improve the searching speed.But because of the sorting custom and special characteristics of Mongolian "UCS" code system,word-sorting cannot depend on natural orders of character-codes solely.The sorting order of words is determined by a character-code and its state synchronously.Firstly,the definition of character-state and word-state is given,then a word-sorting algorithm based on Mealy machine is introduced.This algorithm extends the ability of traditional word-sorting method to solve the problem of sorting two words with the same character sequence but at different word-states.The algorithm′s validity is proved by its application in the traditional Mongolian sorting program MIPT(Mongolian Information Processing Tools).The algorithm has very good adaptability,so it can solve the same problems of some other complex scripts.
出处
《内蒙古大学学报(自然科学版)》
CAS
CSCD
北大核心
2008年第4期465-468,共4页
Journal of Inner Mongolia University:Natural Science Edition
基金
内蒙古自然科学基金项目(200607010812)资助