期刊文献+

基于统计词典和特征加强的多语言文本分类 被引量:3

Multiple Language Text Classification Method Based on Statistical Dictionary and Feature Enhancing
下载PDF
导出
摘要 在统计双语词典的基础上,提出一种特征加强的多语言文本分类方法.在执行文本分类时,考虑到其他语言的训练文本,使得多种语言的文本集合中均存在训练文本,放松了MLTC的要求.特征加强是一种交叉检查过程,即获取两种语言所有特征的卡方统计后,通过语言中相关特征的辨识力,再次对语言的特征辨识力进行评估,以提高分类的可信度.实验选择汉语或英语作为目标语言.实验结果表明:提出的方法具有更高的分类精度,且对训练集规格的敏感度更低. Aiming at the problem that multiple language text classification(MLTC)can only solve single language text classification problem of multiple independent,on the basic of statistical bilingual dictionary,multiple language text classification based on feature enhancing has been proposed.In the implementation of text classification,the training texts of other languages have been taken into account,which makes the text of a variety of languages in the training texts.And it relaxes MLTC requirements.Feature enhancing is a processing of cross examination.After chi square statistics of all the features for the two languages is obtained,the identification of language feature is reassessed through the feature identification to improve the reliability of classification.Chinese or English is chosen as the target language in the experiment.Experimental results show that the proposed method has a higher classification accuracy,and the sensitivity of the training set is lower.
作者 龚静 李英杰 黄欣阳 GONG Jing;LI Ying-jie;HUANG Xin-yang(Department of Public Basic Course,Hunan Polytechnic of Environment and Biology,Hengyang Hunan 421005,China;Computer School,University of South China,Hengyang Hunan 421001,China)
出处 《西南师范大学学报(自然科学版)》 CAS 北大核心 2018年第9期45-50,共6页 Journal of Southwest China Normal University(Natural Science Edition)
基金 国家自然科学基金项目(60572137) 湖南省教育厅项目(12C1056 17C0599)
关键词 多语言文本分类 双语词典 特征加强 交叉检查 敏感度 multiple language text classification bilingual dictionary feature enhancing cross examination sensitivity
  • 相关文献

参考文献7

二级参考文献179

  • 1俞士汶,朱学锋.受限汉语研究的必要性[C].《语言现代化论丛》第三集.天津:南开大学出版社,1997. 被引量:2
  • 2Koehn P. Europarl: A parallel corpus for statistical machine translation[ C ]//Proceedings of Machine Translation Summit X. Phuket: Asia-Pacific Association for Machine Translation, 2005:79 - 86. 被引量:1
  • 3Dandapat S, Morrissey S, Kumar N, et al. Statistically motivated example-based machine translation using translation memory[ C ]// Sharma D, Sangal R, Sarkav S. Proceedings of the 8th International Conference on Natural Language Processing. Kharagpur: Macmillan Publishers, 2010:168-177. 被引量:1
  • 4Renouf A, Kehoe A, Banerjee J. WebCorp : An integrated system for Web text search [ C ]//Nesselhauf C, Hundt M, Biewer C. Corpus Linguistics and the Web. Amsterdam: Rodopi, 2007:47 - 68. 被引量:1
  • 5Baroni M, Bernardini S. BootCaT : Bootstrapping corpora and terms from the Web [ C ]//Teresa M, Maria L, Xavier F, et al. Proceedings of 4th International Conference on Language Resources and Evaluation. Paris: European Language Resourees Association, 2004: 1313-1316. 被引量:1
  • 6Kueera H, Francis W, Carroll J. Computational Analysis of Present Day American English [ M ]. Providence: Brown University Press, 1967. 被引量:1
  • 7Sharoff S. Creating general-purpose corpora using automated search engine queries [ C ]//Baroni M, Bernardini S. WaCky ! Working papers on the Web as Corpus. Bologna: Gedit, 2006:63 -98. 被引量:1
  • 8Chang B. Chinese- English Parallel Corpus Construction and its Application[ C ]//Masuichi H, Ohkuma T, lshikawa K, et al. Proceedings of 18th Pacific Conference on Language, Information and Communication. Tokyo: The Logico-Linguistic Society of Japan, 2004:283 - 290. 被引量:1
  • 9Eisele A, Chen Y. MuhiUN: A Nation documents[ C ]//Calzolari multilingual corpus from United N, Choukri K, Maegaard B, et al. Proceedings of the 7th International Conference on Language Resources and Evaluation. Paris: European Language Resources Association, 2010:2868 - 2872. 被引量:1
  • 10William G, Church K. A program for aligning sentences in bilingual corpora[ C]//Appelt D. Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 1991 : 177 - 184. 被引量:1

共引文献23

同被引文献27

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部