摘要
为有效地解决当前相关标准和标准数据匮乏的问题,通过分析中文文本中地理空间关系描述的语言特点,提出中文文本的地理空间关系标注体系,并以GATE(General Architecture for Text Engineering)为标注工具,以《中国大百科全书中国地理》为文本数据源,采用交叉校验方式建立了地理空间关系标注语料库。实现了中文文本中地理空间关系描述的结构化表达,提供了地理空间关系信息抽取的标准化测试数据。
Corpus annotation is a task to provide both reference and training material for method development and benchmark data sets annotated witha given annotation scheme. After analysis of the linguistic characteristics, an annotation scheme is proposed for markup linguistic expressions for spatial relations in Chinese text. And then a natural language processing software-GATE(General Architecture for Text Engineering) is introduced as the anno- tation tool. Based on the proposed annotation scheme, a corpus with "Encyclopedia of China Geography" as the source data is annotated by means of cross-validation to so^ve the problem of annotation inconsistency, In order to realize the structurized representation of geographical spatial relations described in natural language, and to provide standard training and test data for their extraction.
出处
《测绘学报》
EI
CSCD
北大核心
2012年第3期468-474,共7页
Acta Geodaetica et Cartographica Sinica
基金
国家自然科学基金(40971231)
江苏省研究生创新项目(CXLX11_0874)
关键词
自然语言
中文文本
地理空间关系
标注体系
标注语料库
natural languages Chinese texts spatial relations annotation schemes annotated corpus