期刊文献+

基于无指导机器学习的全文词义自动标注方法 被引量:2

Full-words Automatic Word Sense Tagging Based on Unsupervised Learning Algorithm
下载PDF
导出
摘要 为实现汉语全文词义自动标注,本文采用了一种新的基于无指导机器学习策略的词义标注方法.实验中建立了四个词义排歧模型,并对其测试结果进行了比较.其中实验效果最优的词义排歧模型融合了两种无指导的机器学习策略,并借助依存文法分析手段对上下文特征词进行选择.最终确定的词义标注方法可以使用大规模语料对模型进行训练,较好的解决了数据稀疏问题,并且该方法具有标注正确率高、扩展性能好等优点,适合大规模文本的词义标注工作. For the purpose of implementing automatic Chinese word sense tagging, this paper presents a new method for word sense disambiguation based on unsupervised machine learning strategies. Four models of word sense disambiguation are built and compared. The model with two unsupervised machine learning strategies and selecting contextual features using dependence grammar obtains the best performance. And it can be trained with large-scale corpus to deal with the problem of data sparseness. In addition, it has such characteristics as high accuracy, high speed, easy extension and so on. Thus this technique is competent for word sense tagging on large-scale real-world text.
出处 《自动化学报》 EI CSCD 北大核心 2006年第2期228-236,共9页 Acta Automatica Sinica
基金 国家自然科学基金重点项目(60435020)国家自然科学基金项目(60575042 60573072)资助~~
关键词 词义标注 无指导学习算法 单纯贝叶斯模型 依存文法 Sense tagging, unsupervised learning algorithm, naive Bayesian model, dependency grammar
  • 相关文献

参考文献2

二级参考文献3

共引文献45

同被引文献36

  • 1王细薇,樊兴华,赵军.一种基于特征扩展的中文短文本分类方法[J].计算机应用,2009,29(3):843-845. 被引量:36
  • 2刘挺,卢志茂,李生.一个全文词义自动标注系统的实现[J].哈尔滨工业大学学报,2005,37(12):1603-1605. 被引量:3
  • 3陈文亮,朱靖波,朱慕华,姚天顺.基于领域词典的文本特征表示[J].计算机研究与发展,2005,42(12):2155-2160. 被引量:22
  • 4卢志茂,刘挺,李生.统计词义消歧的研究进展[J].电子学报,2006,34(2):333-343. 被引量:28
  • 5Navigli R. Word sense disambiguation: a survey. ACM Com- puting Surveys, 2009, 41(2): 1011-1069. 被引量:1
  • 6Agirre E, de Lacalle O L, Soroa A. Knowledge-based WSD and specific domains: performing better than generic super- vised WSD. In: Proceedings of the 2009 International Joint Conference on Artificial Intelligence 2009. Pasadena, USA: Morgan Kaufmann Publishers Inc, 2009. 1501-1506. 被引量:1
  • 7Magnini B, Strapparava C, Pezzulo G, Gliozzo A. The role of domain information in word sense disambiguation. Natu- ral Language Engineering, 2002, 8(4): 359-373. 被引量:1
  • 8Navigli R, Ponzetto S P. BabelNet: the automatic construc- tion, evaluation and application of a wide-coverage multi- lingual semantic network. Artitcial Intelligence, 2012, 193: 217-250. 被引量:1
  • 9Stevenson M, Agirre E, Soroa A. Exploiting domain in- formation for word sense disambiguation of medical doc- uments. Journal of the American Medical Informatics Asso- ciation, 2011, 19(2): 235-240. 被引量:1
  • 10Agirre E, de Lacalle O L, Fellbaum C, Hsieh S K, Tesconi M, Monachini M, Vossen P, Seqers R. SemEval-2010 task 17: all-words word sense disambiguation on a specific do- main. In: Proceedings of the 2009 NAACL HLT Workshop on Semantic Evaluations: Recent Achievements and Fhlture Directions. Boulder, Colorado: Association for Computa- tional Linguistics, 2009. 123-128. 被引量:1

引证文献2

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部