期刊文献+

基于音节划分及短语表优化的英汉人名音译研究 被引量:1

English-Chinese Name Transliteration Based on Optimization of Syllabification and Phrase Table
下载PDF
导出
摘要 把英汉人名音译问题转换为以音节为基本单位的翻译问题,将连续的音节组合看作短语,引入一种基于短语的统计机器翻译方法,实现英汉人名的音译。首先,针对现有音节划分方法存在的问题,提出一种改进的音节划分方法;其次,该文提出去除低频词法及基于C-value方法对短语表进行优化,解决了训练语料偏小导致短语表中出现杂质信息的问题;之后,融入了汉语人名中首字(词)及尾字(词)的位置特征,改善了生成的音译候选中汉字选取的不合理性;最后,提出了两阶段音节划分方法,缓解了音节划分粒度过大导致的音译错误。与基准方法相比,其音译准确率ACC由63.78%提高到67.56%。 The English Chinese name transliteration can be described as syllable-based translation, which can be solved by current a phrase based statistical machine translation model. After describing a detailed rule-based syllab- ification method, this paper presents a translation phrase table optimization by frequency thresh hold and c-value. In addition, the method is also featured by integrating the local features of Chinese names, as well as a two stage of syl- labification strategy. The experimental results show that the performance of the English-Chinese name translitera tion is improved from 63. 78% to 67.56% in terms of ACC.
出处 《中文信息学报》 CSCD 北大核心 2016年第3期96-102,共7页 Journal of Chinese Information Processing
基金 国家自然科学基金(61173100 61173101 61272375) 福建省自然科学基金(2014J01218)
关键词 英汉人名音译 音节划分 短语表优化 C-VALUE English-Chinese name transliteration syllabification phrase table optimization C-value
  • 相关文献

参考文献13

  • 1Karimi S,Scholer F,Turpin A. Machine transliteration survey[J]. ACM Computing Surveys (CSUR),2011,43(3): 17-46. 被引量:1
  • 2Knight K,Graehl J. Machine transliteration[J]. Computational Linguistics,1998,24(4): 599-612. 被引量:1
  • 3Haizhou L,Min Z,Jian S. A joint source-channel model for machine transliteration[C]//Proceedings of the 42nd Annual Meeting on association for Computational Linguistics. Association for Computational Linguistics,2004: 159-166. 被引量:1
  • 4Oh J H,Choi K S. An ensemble of transliteration models for information retrieval[J]. Information processing & management,2006,42(4): 980-1002. 被引量:1
  • 5Jia Y,Zhu D,Yu S. A noisy channel model for grapheme-based machine transliteration[C]//Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration. Association for Computational Linguistics,2009: 88-91. 被引量:1
  • 6Zhang C,Li T,Zhao T. Syllable-based machine transliteration with extra phrase features[C]//Proceedings of the 4th Named Entity Workshop. Association for Computational Linguistics,2012: 52-56. 被引量:1
  • 7于恒,凃兆鹏,刘群,刘洋.基于多粒度的英汉人名音译[J].中文信息学报,2013,27(4):16-21. 被引量:4
  • 8Li L,Wang P,Huang D,et al. Mining English-Chinese Named Entity Pairs from Comparable Corpora[J]. ACM Transactions on Asian Language Information Processing (TALIP),2011,10(4): 19. 被引量:1
  • 9Frantzi K,Ananiadou S,Mima H. Automatic recognition of multi-word terms: the C-value/NC-value method[J]. International Journal on Digital Libraries,2000,3(2): 115-130. 被引量:1
  • 10Zhang M,Li H,Liu M,et al. Whitepaper of news 2012 shared task on machine transliteration[C]//Proceedings of the 4th Named Entity Workshop. Association for Computational Linguistics,2012: 1-9. 被引量:1

二级参考文献14

  • 1Li Haizhou , Zhang Min, SuJian. AJoint Source?Channel Model for Machine Transliteration[CJ/ /Pro?ceedings of ACL,2004: 159-166. 被引量:1
  • 2Kevin Knight,J. Graehl. Machine Transliteration[J], Computational Linguistics, 1998, 24(4): 599-612. 被引量:1
  • 3Yaser Al-Onaizan , Kevin Knight. Translating named entities using monolingual and bilingual resources[CJ/ /Proceedings of ACL, 2002: 400-408. 被引量:1
  • 4Tarek Sherif, Grzegorz Kondrak. Bootstrapping a sto?chastic transducer for Arabic-English transliteration extraction[CJ/ /Proceedings of ACL, 2007: 864-87l. 被引量:1
  • 5Wei-Hao Lin, Hsin-His Chen. Backward Machine Transliteration by Learning Phonetic Similarity[CJ/ / Proceedings of the 6th CoNLL, 2002: 139-145. 被引量:1
  • 6邹波,赵军.英汉人名音译方法研究//第四届全国学生计算语言学研讨会论集,2008:24-30. 被引量:2
  • 7Brown P F, Pietra SAD, Pietra VJ D. The mathe?matics of statistical machine translation: parameter es?timation[J]. Computational Linguistics, 1993: 19(2): 263-31l. 被引量:1
  • 8David Chiang. Hierarchical phrase-based translation[J]. Computational Linguistics, 2007, 33 (2): 201- 288. 被引量:1
  • 9FranzJosef Och , Hermann Ney. A Systematic Com?parison of Various Statistical Alignment Models[J]. Computational Linguistics, 2003, 29(1): 19-5l. 被引量:1
  • 10LongJiang, Ming Zhou , Lee-Feng Chien, et al. Named entity translation with web mining and trans?literation[CJ/ /Proceedings of IJCAI, 2007: 1629- 1634. 被引量:1

共引文献3

同被引文献12

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部