摘要
随着手持设备的发展,语言模型压缩在研究中占据着重要位置。使用互信息和熵差相结合的方法对二元模型进行压缩。其基本思想是首先使用互信息对二元信息重要性进行判断,然后使用基于熵差的压缩方法得到最终的语言模型,以困惑度为评价标准将使用该方法压缩后的语言模型与其他方法进行比较。实验结果表明该方法得到的模型性能更好。
With the development of handsets, the research about the language model compression becomes increasingly important. In this paper we compress the bi-gram model in the way of combining the mutual information and the difference of entropy. This method firstly judges the importance of the bi-gram with mutual information. Then it uses the entropy-based pruning to get the final language model. We compare this method with other methods using the perplexity. The results show that the performance of the language model using this method is better.
出处
《苏州大学学报(工科版)》
CAS
2008年第3期16-20,共5页
Journal of Soochow University Engineering Science Edition (Bimonthly)
基金
高等学校博士学科点专项科研基金项目(编号20060285008)
关键词
语言模型压缩
互信息
熵差
困惑度
language model compression
mutual information
difference of entropy
perplexity