摘要
修辞结构理论是一种重要的篇章结构理论,其核心是修辞结构关系。该文基于修辞结构理论,结合中文文本特点,提出面向中文的层次化修辞结构关系分类体系及多元定义。同时,针对标注者遇到的歧义问题,提出了无歧义标注方法。为了便于标注,设计并实现了基于Java图形界面的标注工具RSTTagger,该工具以句子的主谓结构关键词构成的元组作为基本标注单位,自底向上逐级标注,最终标注成一棵完整的修辞结构关系树。为验证标注结果的一致性,选取160篇中文外贸领域语料进行标注,不同标注者同时标注其中50篇,标注一致性达到76.63%。该标注框架可以应用到其他领域语料标注中,已标注的160篇语料可以作为篇章结构理论研究的基础语料库。
Rhetorical Structure Theory(RST)is a common discourse structure theories,emphasizing the RSR(rhetorical structure relation).Based on English-oriented RST and the characteristics of Chinese text,this paper presents a hierarchical taxonomy and multiple definitions of Chinese-oriented RSR.Moreover,an annotated method is proposed to deal with the problem of ambiguity.A Java-GUI based tagging tool called RST Tagger is designed and implemented as a bottom-up tagger,whose elementary tagging unit is a subject-predicate structure and tagging result is a full discourse structure tree.To validate our proposed tagging framework,we selected 160 Chinese foreign trade text as the tagging corpus,from which 50 texts were randomly selected to be tagged by different annotators.We got annotator agreement with score 76.63%.
作者
侯圣峦
费超群
张书涵
HOU Shengluan;FEI Chaoqun;ZHANG Shuhan(Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100190,China)
出处
《中文信息学报》
CSCD
北大核心
2019年第7期20-30,共11页
Journal of Chinese Information Processing
基金
国家重点研发计划(2016YFB1000902)
国家自然科学基金(61232015,61472412,61621003)
关键词
自然语言处理
修辞结构理论
修辞结构关系
篇章结构分析
natural language processing
phetorical structure theory
rhetorical structure relation
discourse parsing