摘要
针对中文篇章中的零指代问题,提出一种基于中英文可比较语料进行中文零指代识别和消解的方法,并提出英文对等句的概念。利用对等句,重新定义句子间隔,并引入双语词对齐特征。在基准平台基础上,从零指代项识别和零指代项消解两个方面进行研究。在Onto Notes5.0语料上的实验结果表明,与目前性能最好的系统相比,新提出的基于中英对等语料的中文零指代方法取得更好的性能。
A bilingual approach based on a comparable corpus is proposed to better detect and to resolve Chinese zero pronouns.The concept of English equivalent sentence is defined firstly.Then the equivalent sentence is employed to redefine the distance between sentences and to extract bilingual word alignment features.In this way,both zero pronoun detection and resolution of the baseline system from bilingual perspective are improved.The experiments conducted on the OntoNotes5.0 corpus show that the proposed approach can significantly outperform the state-of-the-art system.
出处
《北京大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2017年第2期279-286,共8页
Acta Scientiarum Naturalium Universitatis Pekinensis
基金
国家自然科学基金(61333018,61472264,61305088)资助
关键词
中文零指代
双语
对等句
识别
消解
Chinese zero pronoun
bilingual
equivalent sentence
detection
resolution