摘要
指代消解是自然语言处理中重要的研究课题之一。结合基于实例的学习方法,提出了一种基于Fuzzy Rough集模型的中文人称代词消解方法。该方法的第一步过滤掉与人称代词性别和单复数特征不一致的名词短语,构成候选集,然后按照仅涉及浅层语义和语法知识的属性集对其中的每个名词短语进行标记。第二步利用Fuzzy Rough集模型中相关概念选择代表性较强的实例,并对其进行属性值约简,以提高这些实例的泛化能力。以上两步即为学习阶段。第三步即可根据这些实例判断新输入的名词短语是否为代词的先行语。该方法用人民日报语料进行了测试,测试结果表明该方法是有效的。
Anaphora resolution is an important issue in natural language processing. This paper presented an approach based on FUZZy Rough sets model combined with instance -based learning approach to resolve pronominal anaphora within Chinese text. The first phase of the presented approach is preprocessing. In this phase, after extracting noun phases and eliminating those whose number and gender features are inconsistent with pronominal anaphora, the potential antecedents set was formed. Then,the attri bute values of every noun phase in this set were computed according to an attribute set which only involves shallow syntactic and semantic information. The second phase aimed to select representative examples from the potential antecedents set and reduce redundant attributes to improve the generalization capability of these examples. These tasks were done by using concepts of Fuzzy Rough sets model. The two phases above can be regarded as learning phase. In the last phase, those examples were used to estimate whether a new noun phase is the antecedent of a pronominal anaphor. The presented approach was tested by People Daily corpus. The results show that this approach is effective.
出处
《计算机科学》
CSCD
北大核心
2010年第1期245-250,共6页
Computer Science
基金
国家自然科学基金(60873077)资助