摘要
本文提出了一种基于多层过滤的算法。该算法主要实现从对齐的中英文句子中自动的抽取与对齐双语语块。根据不同语块具备的不同特性,采用不同的层次对其处理。该算法不同于传统的算法,它不需要对句子进行标注,句法分析,词法分析甚至不需要对汉语句子进行分词等操作。初步的实验结果表明该算法性能较好,测试的结果是:抽取语块的准确率能达到F =0 70 ,对齐语块的准确率能达到F =0 80 ;而且将此算法获得的对齐双语语块用于统计机器翻译系统,跟基于词的系统做对比,结果表明基于语块的翻译系统明显提高了翻译水平,差不多能提高10 %。
In this paper we propose a new algorithm called multi-layer filtering to extract the bilingual alignment chunks automatically from Chinese-English parallel texts. Various layers are used to extract bilingual chunks according to different features possessed by different chunks in the bilingual corpus. Our chunking and alignment algorithm does not rely on the information from tagging, parsing or syntax analyzing as most conventional algorithms do. The preliminary experimental results express that our algorithm achieves a good performance in chunking and alignment. The F-measure of chunking is 0.7 and the F-measure of alignment is 0.8. Moreover, the translations generated by this algorithm are much better than the results generated by the baseline word alignment algorithm; it almost improves of 10%.
出处
《中文信息学报》
CSCD
北大核心
2005年第3期54-60,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目 (6 0 2 72 0 4 1
6 0 12 130 2 )
关键词
人工智能
机器翻译
多层过滤
双语语块识别与对齐
artificial intelligence
machine translation
chunking and alignment
multi-layer-filtering