摘要
在以国际标准编码存储的传统蒙古文电子文本中,拼写错误十分普遍。人工校对这些错误不仅速度慢而且成本高。该文提出了一种基于统计翻译框架的传统蒙古文自动拼写校对方法,将拼写校对看作是从错误词到正确词的翻译。该文使用改进的基于短语的统计机器翻译模型来构建拼写校对模型,然后对测试文本进行校对。实验结果表明,该方法可以快速、有效地校对拼写错误,而且不依赖于特定语言的语法知识。使用该方法对包含1 026个正确词、1 102个错误词的测试集进行拼写校对,校对后文本中的正确词所占比例最高可达97.55%。
In traditional Mongolian electronic textsencoded inUnicode, spelling errors are very common. The cost of correcting spelling errors artificially is extremely high. This paper proposed an automatic spellingcorrection method for traditional Mongolian based on statistical machine translation framework, and we regardspelling correction task as a translation work which translates the wrong words to the correct words. This paper used the improved phrasebased statistical machine translation model to build spelling correction model. We use this model tocorrect the rawtext. We used atest set whichcontained 1 026 correct words and 1 102 wrong words to test our method, Experimental results show that our method can correct spelling errors quickly and efficiently without special language knowledge. The percentage of correct words in ourproofreadtextcan reach to 97.55%.
出处
《中文信息学报》
CSCD
北大核心
2013年第6期175-179,共5页
Journal of Chinese Information Processing
基金
工信部电子信息产业发展基金课题资助项目
关键词
蒙古文
拼写检查
拼写校对
机器翻译
Mongolian spelling check spelling correction
machine translation