摘要
为缓解译文消歧任务中消歧知识获取困难及数据稀疏问题,提出了一种基于Web的挖掘双语词汇相关关系的无指导译文消歧方法。该方法将双语词汇在语料库中的间接相关拓展到Web,提出了基于Web的双语词汇间接相关模型,在此基础上又提出了一种基于Web的双语词汇相关度的消歧方法,通过构造不同queries并利用搜索引擎抽取返回页面的page counts,最后利用点式互信息来计算词汇间的相关度并用于消歧决策。该方法最好性能(P_(mar)=0.464)超过了国际语义评测Semeval-2007的Task #5上可比较的最好无指导系统TorMd。
This paper presents an unsupervised method by mining Web relatedness of bilingual words. It intends to solve the problem of knowledge acquisition and data sparse in translation disambiguation. By introducing an indirect association model of bilingual words first, this paper expands it to bilingual web page. It goes a step further to a b!lingual Web relatedness which centers around Web pages. It computes point-wise mutual information between words as relatedness and makes disambiguation by constructing different queries and extracting Web page counts through search engine. This method achieves the best performance. It outperforms the best unsupervised system TorMd on Semeval-2007 Task # 5 and gets the state-of-the-art results (Pmar = 0.464).
出处
《高技术通讯》
EI
CAS
CSCD
北大核心
2010年第4期349-354,共6页
Chinese High Technology Letters
基金
973计划(2004CB318102)
国家自然科学基金(60903063)
中国博士后科学基金(20090450007)资助项目
关键词
无指导译文消歧
双语词汇相关
页面计数
间接相关
基于WEB
unsupervised translation disambiguation, bilingual word relatedness, page count, indirect association, web based