期刊文献+

多分类器集成的汉语词义消歧研究 被引量:14

Ensembles of Classifiers for Chinese Word Sense Disambiguation
下载PDF
导出
摘要 词义消歧长期以来一直是自然语言处理中的热点和难题,集成方法被认为是机器学习研究的四大趋势之一.系统研究了9种集成学习方法在汉语词义消歧中的应用.9种集成方法分别是乘法规则、均值、最大值、最小值、多数投票、序列投票、加权投票、概率加权和单分类器融合,其中乘法规则、均值、最大值3种集成方法还未曾应用于词义消歧.选取支持向量机模型、朴素贝叶斯和决策树作为3个单分类器.在两个不同的数据集上进行了实验,其一是选自现代汉语语义标注语料库的18个多义词,其二是国际语义评测SemEval-2007的中英文对译选择词消歧任务.实验结果显示,首次在词义消歧中引入应用的3种集成方法乘法、均值、最大值有良好的性能表现,3种方法的消歧准确率均高于最佳单分类器SVM,而且优于其他6种集成方法. Word sense disambiguation has long been a central concern for natural language processing, and ensemble of classifiers is one of the four current directions in machine learning study. This paper makes a systematic study on the ensembles of classifiers for Chinese word sense disambiguation. Nine kinds of combining strategies are experimented in this paper: product, average, max, rain, majority voting, rank-based voting, weighted voting, weighted probability, and best single combining, among which the three combining methods of product, average and max have not been applied in word sense disambiguation in previous works. Support vector machine, naYve Bayes, and decision tree are selected as the three component classifiers. Four kinds of features are used in all of the three classifiers: bag of words, words with position, parts of speech with position and 2-gram collocations. Experiments are conducted in two different datasets: the first dataset is 18 ambiguous words selected from Chinese semantic corpus, and the second dataset is the multilingual Chinese-English lexical sample task at SemEval-2007. The experimental results illustrate that the three kinds of combining strategies of average, product and max, which are applied for the first time in Chinese word sense disambiguation in this paper, exceed the accuracy of best single classifier support vector machine, and also outperform the other six kinds of combining methods.
出处 《计算机研究与发展》 EI CSCD 北大核心 2008年第8期1354-1361,共8页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60703063) 国家社会科学基金项目(08CYY016) 国家"八六三"高技术研究发展计划基金项目(2007AA01Z198) 国家"九七三"重点基础研究发展规划基金项目(2004CB318102)~~
关键词 词义消歧 多分类器集成 均值 最大值 word sense disambiguation ensemble of classifiers average max
  • 相关文献

参考文献14

  • 1Dietterich T G. Machine learning research: Four current directions [J]. AI Magazine, 1997, 18(4): 97-136. 被引量:1
  • 2Mihalcea R, Chklovski T, Kilgariff A. The SENSEVAL-3 English lexical sample task [C/OL] //Proc of SENSEVAL 3. (2004) [2007-07-20]. http://aclweb, org/anthology new. 被引量:1
  • 3Kilgarriff A, Rosenzweig J. Framework and results for English SenSeval [J]. Computers and the Humanities, 2000, 34(1):15-48. 被引量:1
  • 4Pedersen T. A simple approach to building ensembles of naive Bayesian classifiers for word sense disambiguation [C/OL]//Proc of NAACL-2000. (2000) [ 2007-07-20]. http://aclweb, org/anthology-new. 被引量:1
  • 5Stevenson M, Wilks Y. The interaction of knowledge sources in word sense disambiguation [J]. Computational Linguistics, 2001, 27(3): 321-349. 被引量:1
  • 6Florian R, Cucerzan S, Schafer C, et al. Combing classifiers for word sense disambiguation [J]. Natural Language Engineering, 2002, 1(1): 1-14. 被引量:1
  • 7Carpuat M, Su W, Wu D. Augmenting ensemble classification for word sense disambiguation with a kernel PCA model [C/OL] //Proc of SENSEVAL-3. (2004) [2007- 07-20]. http ://aclweb. org/anthology new. 被引量:1
  • 8Wang X J, Matasumoto Y. Trajetory based word sense disambiguation [C/OL] //Proc of the 20th Int'l Conf on Computational Linguistics. (2004) [2007-07-20]. http:// aclweb, org/anthology-new. 被引量:1
  • 9全昌勤,何婷婷,姬东鸿,余绍文.基于多分类器决策的词义消歧方法[J].计算机研究与发展,2006,43(5):933-939. 被引量:8
  • 10Kittler J, Hater M, Duin R P, et al. On combining classifiers [J]. IEEE Trans on Pattern Analysis and Machine Intellifence, 1998, 20(3): 226-239. 被引量:1

二级参考文献7

  • 1Nancy I de,Jean Veronis.Introduction to the special issue on word sense disambiguation:The state of the Art.Computational Linguistics,1998,24(1):1~40 被引量:1
  • 2Y.Freund,R.E.Schapire.Experiments with a new boosting algorithm.In:Proc.13th Int'l Conf.Machine Learning.San Francisco:Morgan Kaufmam,1996.148~156 被引量:1
  • 3S.Abney,R.E.Schapire,Y.Singer.Boosting applied to tagging and PP-attachment.In:Proc.Joint SIGDAT Conf.Empirical Methods in Natural Language Processing and Very Large Corpora,1999.38 ~ 45 http://citeseer.ist.psu.edu/context/930001/588691 被引量:1
  • 4R.E.Schapire,Y.Singer.BoostTexter:A boosting-based system for text categorization.Machine Learning,2000,39(2):135 ~ 168 被引量:1
  • 5Gerard Escudero Llu s Marquez,German Rigau.Boosting applied to word sense disambiguation.The 1 1th European Conf.Machine Learning (ECML 2000),Barcelona,Spain,2000 被引量:1
  • 6Seong-Bae Park,Byoung-Tak Zhang,Yung Taek Kim.Word sense disambiguation by learning decision trees from unlabeled data.Applied Intelligence,2003,19(1-2):27~ 38 被引量:1
  • 7鲁松,白硕,黄雄,张健.基于向量空间模型的有导词义消歧[J].计算机研究与发展,2001,38(6):662-667. 被引量:37

共引文献7

同被引文献128

引证文献14

二级引证文献89

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部