摘要
为有效对老挝语进行词性标注,提出一种融合词预测的半监督隐马尔科夫词性标注方法.首先,为解决未登录词标注问题,基于长短期记忆网络建立词预测模型,并改进维特比算法来将词预测模型融入隐马科夫模型中;其次,为提高隐马科夫模型标注的准确率与速度,使用规则与统计相结合的方法.制定了详细的老挝语法规则集,并将规则集与隐马科夫模型进行结合;再有,为扩展老挝语词性标注语料库规模,使用半监督学习方法,以得到正、反半监督隐马科夫模型;最后,为了解决隐马科夫模型未考虑后续词性对当前标注影响问题,使用正、反半监督隐马科夫模型进行词性标注,并优化了标注结果.实验结果证明,该方法可以有效标注老挝语词性,准确率达到92.55%.
In order to label Lao part of speech effectively,an approach combining word prediction and semi-supervised part of speech tagging based on hidden markov is proposed in this paper.Firstly,in order to solve the problem of unknown words tagging,a word prediction model based on long short-term memory is established,and the viterbi algorithm is improved to incorporate the word prediction model into a hidden markov model.Secondly,in order to improve the accuracy and speed of the hidden markov model,the method of combining rules with statistics is used.A detailed rules set is developed,and combined to hidden markov model.In addition,in order to expand the corpus scale of Lao part of speech tagging,semi-supervised learning method is used to obtain forward and reverse semi-supervised hidden markov models.At last,hidden markov model doesn’t consider the influence of the next part of speech on the current part of speech tagging,in order to solve this problem,the forward and reverse semi-supervised hidden markov models are used for part of speech tagging,and the tagging results are optimized.The experimental results show that this method achieves a good performance in Lao part of speech tagging,and its accuracy is up to 92.55%.
作者
王兴金
周兰江
张金鹏
周枫
郭剑毅
WANG Xing-jin;ZHOU Lan-jiang;ZHANG Jin-peng;ZHOU Feng;GUO Jian-yi(The Key Laboratory of Intelligent Information Processing,Kunming University of Science and Technology,Kunming 650500,China;Information Management Center,Yunnan University of Finance and Economics,Kunming 650221,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2019年第12期2500-2505,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61662040,61562049)资助
关键词
词预测
老挝语词性标注
隐马科夫模型
半监督学习
word prediction
lao part of speech tagging
hidden markov model
semi-supervised learning