摘要
实体指代识别(Entity Mention Detection,EMD)是识别文本中对实体的指代(Mention)的任务,包括专名、普通名词、代词指代的识别。本文提出一种基于多层次特征集成的中文实体指代识别方法,利用条件随机场模型的特征集成能力,综合使用字符、拼音、词及词性、各类专名列表、频次统计等各层次特征提高识别性能。本文利用流水线框架,分三个阶段标注实体指代的各项信息。基于本方法的指代识别系统参加了2007年自动内容抽取(ACE07)中文EMD评测,系统的ACE Value值名列第二。
The purpose of Entity Mention Detection (EMD) is to recognizel all mentions of entities in a document, involving recognition of named entities, noun words and pronoun coreference etc. In this paper, we propose an approach for Chinese entity mention detection by integrating multi-level features into the Conditional Random Fields (CRFs) framework. These features used include characters, phonetic symbols, lexical words and part-of-speech, named entities, and frequency statistics. All EMD subtasks are integrated into a three-stage pipeline framework in which three different CRFs classifiers are used to label different attributes sequentially in a predefined order. The system described here is the our submission to NIST ACE07 EMD Evaluation project, and achieved rank-2 performance in ACE07.
出处
《中文信息学报》
CSCD
北大核心
2007年第5期126-130,共5页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(60473140)
国家863高科技计划资助项目(2006AA01Z154)
国家教育部新世纪优秀人才计划资助项目(NCET-05-0287)
国家985工程计划资助项目(985-2-DB-C03)
关键词
计算机应用
中文信息处理
实体指代识别
多任务标注
条件随机场模型
ACE评测
computer applicatiopn
Chinese information processing
entity mention detection
mutil-task labeling conditional random fields
ACE evaluation