摘要
针对基于维吾尔语的N-gram模型统计数据稀疏问题造成统计模型识别性能降低,研究针对政府文献和报告领域的语料进行了1到3元文法统计,采用加法、线性插值、Witten-Bell和Kneser-Ney平滑算法进行了约束。结果表明,本实验中Kneser-Ney平滑技术可以大大降低统计维吾尔语的N-gram模型的困惑度。
For the reasons that statistic data sparse problem of Uyghur N-gram model caused statistic model low recognition performance,A N-gram model smoothing algorithm which is adapt to the Uyghur language was put forward.A 1-gram to 3-gram probability statistics were built in government references and Government reports domains,Addition,Linear interpolation,Witten-Bell and Kneser–Ney smoothing algorithm to added the grammar control.The results of the experiments shows that the perplexity of statistic models is decreased greatly by using the Kneser –Ney smoothing.
出处
《电脑知识与技术(过刊)》
2011年第6X期4177-4179,共3页
Computer Knowledge and Technology
关键词
语言模型
平滑算法
困惑度
维吾尔语-汉语双语语料
language model
smoothing algorithm
perplexity
Uyghur language & Chinese parallel corpus