摘要
针对汉语一词多义现象,根据上下文所蕴含的语言学知识,采用受限玻尔兹曼机(restricted boltzmann machine,RBM)来确定歧义词汇的真实含义。选取歧义词汇左右邻接的四个词单元中的词形、词性和语义类作为消歧特征。同时,使用RBM来构建词义消歧模型。结合SemEval-2007:Task#5的训练语料和哈尔滨工业大学的语义标注语料来优化RBM的参数。利用SemEval-2007:Task#5的测试语料对词义消歧模型进行测试。实验结果表明:相对于贝叶斯词义消歧分类器而言,受限玻尔兹曼机词义消歧方法的消歧准确率有所提高。
For polysemy phenomenon in Chinese,Restricted Boltzmann Machine(RBM)is adopted to determine the true meaning of ambiguous vocabulary where linguistic knowledge in context is used.Word form,part of speech and semantic categories in four left and right lexical units adjacent to an ambiguous word are selected as disambiguation features.At the same time,RBM is used to construct word sense disambiguation(WSD)model.Training corpus in SemEval-2007:Task#5 and semantic annotation corpus in Harbin Institute of Technology are used to optimize parameters of RBM.Test corpus in SemEval-2007:Task#5 is used to evaluate WSD model.Experimental results show that compared with Bayesian word sense disambiguation classifier,disambiguation accuracy of WSD method with RBM is improved.
作者
张春祥
李海瑞
高雪瑶
ZHANG Chun-xiang;LI Hai-rui;GAO Xue-yao(School of Software and Microelectronics,Harbin University of Science and Technology,Harbin 150080,China;School of Computer Science and Technology,Harbin University of Science and Technology,Harbin 150080,China)
出处
《哈尔滨理工大学学报》
CAS
北大核心
2019年第5期116-121,共6页
Journal of Harbin University of Science and Technology
基金
国家自然科学基金(61502124,60903082)
中国博士后科学基金(2014M560249)
黑龙江省自然科学基金资助项目(F2015041,F201420)
关键词
受限玻尔兹曼机
消歧特征
词义消歧
训练语料
Restricted Boltzmann Machine
disambiguation features
word sense disambiguation
training corpus