基于挖掘Web双语词汇关联度的无指导译文消歧

Unsupervised translation disambiguation based on mining Web relatedness of bilingual words

下载PDF

导出

摘要为缓解译文消歧任务中消歧知识获取困难及数据稀疏问题,提出了一种基于Web的挖掘双语词汇相关关系的无指导译文消歧方法。该方法将双语词汇在语料库中的间接相关拓展到Web,提出了基于Web的双语词汇间接相关模型,在此基础上又提出了一种基于Web的双语词汇相关度的消歧方法,通过构造不同queries并利用搜索引擎抽取返回页面的page counts,最后利用点式互信息来计算词汇间的相关度并用于消歧决策。该方法最好性能(P_(mar)=0.464)超过了国际语义评测Semeval-2007的Task #5上可比较的最好无指导系统TorMd。 This paper presents an unsupervised method by mining Web relatedness of bilingual words. It intends to solve the problem of knowledge acquisition and data sparse in translation disambiguation. By introducing an indirect association model of bilingual words first, this paper expands it to bilingual web page. It goes a step further to a b!lingual Web relatedness which centers around Web pages. It computes point-wise mutual information between words as relatedness and makes disambiguation by constructing different queries and extracting Web page counts through search engine. This method achieves the best performance. It outperforms the best unsupervised system TorMd on Semeval-2007 Task # 5 and gets the state-of-the-art results （Pmar = 0.464）.

作者刘鹏远赵铁军

机构地区北京大学信息科学与技术学院计算语言学研究所哈尔滨工业大学计算机科学与技术学院

出处《高技术通讯》 EI CAS CSCD 北大核心 2010年第4期349-354,共6页 Chinese High Technology Letters

基金 973计划(2004CB318102) 国家自然科学基金(60903063) 中国博士后科学基金(20090450007)资助项目

关键词无指导译文消歧双语词汇相关页面计数间接相关基于WEB unsupervised translation disambiguation, bilingual word relatedness, page count, indirect association, web based

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1YANG Che-Yu.Word sense disambiguation using semantic relatedness measurement[J].Journal of Zhejiang University-Science A(Applied Physics & Engineering),2006,7(10):1609-1625. 被引量：7
2刘鹏远,赵铁军,杨沐昀,李壮.基于等价伪译词模型的无指导译文消歧研究[J].电子与信息学报,2008,30(7):1690-1694. 被引量：3

二级参考文献15

1Edmonds P and Cotton S. Senseval-2: Overview. In Proceedings of the Second International Workshop on evaluating Word Sense Disambiguation Systems, Toulouse, France, 2001: 1-5. 被引量：1
2Mihalcea R, Chklovski T, and Killgariff A. The Senseval-3 English lexical sample task. In Proceedings of the Third InternationalWorkshop on the Evaluation of Systems for the Semantic Analysis of Text (Senseval-3). Barcelona, Spain, 2004: 25-28. 被引量：1
3Li Hang and Li Cong. Word translation disambiguation using bilingual bootstrapping. Computational Linguistics, 2004, 20(4): 563-596. 被引量：1
4Yarowsy D. Unsupervised word sense disambiguation rivaling supervised methods.In Proceedings of the 33^rd Annual Meeting of Association for Computational Linguistics (ACL 1995)(Cambridge, MA, June 1995): 189-196. 被引量：1
5Gale W A, Church K W, and Yarowsky D. Using bilingual materials to develop word sense disambiguation methods. In Proceedings of the International Conference on Theoretical and Methodological Issues in Machine Translation. Montreal1992: 101-112. 被引量：1
6Diab M and Resnik P. An unsupervised method for word sense tagging using parallel corpora. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL-02). Philadelphia, USA. 2002: 255-262. 被引量：1
7Ng Hwee Tou, WangBin, and Chan Yee Seng. Exploiting parallel texts for word sense disambiguation: an empirical study. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Sapporo, Japan. 2003: 465-462. 被引量：1
8Wang X and Carroll J. Word sense disambiguation using sense examples automatically acquired from a second language. In Proceedings of HLT/EMNLP, Vancouver,Canada. 2005: 547-554. 被引量：1
9Leacock C, Chodorow M, and Miller G A. Using corpus statistics and WordNet relations for sense identification. Computational Linguistics, 1998, 24(2): 147-165. 被引量：1
10Agirre E and Martfnez D. Unsupervised WSD based on automatically retrieved examples: The importance of bias. Proceedings of the Conference on Empirical Methods in NLP. Barcelona, Spain. 2004: 25-32. 被引量：1

共引文献7

1刘鹏远,赵铁军.利用语义词典Web挖掘语言模型的无指导译文消歧[J].软件学报,2009,20(5):1292-1300. 被引量：7
2王瑞琴,孔繁胜.无监督词义消歧研究[J].软件学报,2009,20(8):2138-2152. 被引量：17
3刘鹏远,赵铁军.基于Web的无指导译文消歧词模型与N-gram模型及对比研究[J].电子与信息学报,2009,31(12):2969-2974. 被引量：3
4刘鹏远,赵铁军.基于双语词汇Web间接关联的无指导译文消歧[J].软件学报,2010,21(4):575-585. 被引量：6
5郭瑛媚,史晓东,陈毅东,高燕.基于话题分布相似度的无监督评论词消歧方法[J].北京大学学报（自然科学版）,2013,49(1):95-101. 被引量：2
6陈靖元,周刚,卢记仓.融合HowNet和词林信息含量的词语相似度计算[J].小型微型计算机系统,2022,43(6):1303-1308. 被引量：1
7汤慧桃,王军.基于近邻聚类的词汇相似度测量研究[J].西安外国语大学学报,2024,32(1):32-39.

1刘鹏远,赵铁军.基于双语词汇Web间接关联的无指导译文消歧[J].软件学报,2010,21(4):575-585. 被引量：6
2刘鹏远,赵铁军.基于Web的无指导译文消歧词模型与N-gram模型及对比研究[J].电子与信息学报,2009,31(12):2969-2974. 被引量：3
3刘鹏远,赵铁军.利用语义词典Web挖掘语言模型的无指导译文消歧[J].软件学报,2009,20(5):1292-1300. 被引量：7
4王陵,陈云红,李书明.用ASP技术实现Web页面计数[J].湖北师范学院学报（自然科学版）,2001,21(1):68-71.
5刘鹏远,赵铁军,杨沐昀,李壮.基于等价伪译词模型的无指导译文消歧研究[J].电子与信息学报,2008,30(7):1690-1694. 被引量：3
6冉婕,孙瑜,漆丽娟.基于本体的概念相似度计算及其应用[J].微型机与应用,2010,29(11):14-16. 被引量：3
7周兴华.AVR单片机入门及C语言高效设计实践（五）[J].电子世界,2007(2):29-32.
8李芳,盛焕烨.双语词汇自动获取系统[J].上海交通大学学报,2001,35(9):1386-1389.
9铁治欣,陈奇,俞瑞钊.采掘关联规则的高效并行算法[J].计算机研究与发展,1999,36(8):948-953. 被引量：37
10杜家菊,陈志伟.使用SPSS线性回归实现通径分析的方法[J].生物学通报,2010,45(2):4-6. 被引量：553

高技术通讯

2010年第4期

浏览历史

内容加载中请稍等...

基于挖掘Web双语词汇关联度的无指导译文消歧

参考文献2

二级参考文献15

共引文献7

相关作者

相关机构

相关主题

浏览历史