期刊文献+

基于自学习的汉语开放域命名实体边界识别

Chinese Open-domain Named Entity Boundary Identification based on A Self-Training Method
下载PDF
导出
摘要 命名实体识别是自然语言处理领域的一个重要任务,为许多上层应用提供支持。本文主要研究汉语开放域命名实体边界的识别。由于目前该任务尚缺乏训练语料,而人工标注语料的代价又太大,本文首先基于双语平行语料和英语句法分析器自动标注了一个汉语专有名词语料,另外基于汉语依存树库生成了一个名词复合短语语料,然后使用自学习方法将这两部分语料融合形成命名实体边界识别语料,同时训练边界识别模型。实验结果表明自学习的方法可以提高边界识别的准确率和召回率。 Named entity recognition is an important task in the domain of Natural Language Processing, which plays an important role in many applications. This paper focuses on the boundary identification of Chinese open - domain named entities. Because the shortage of training data and the huge cost of manual annotation, the paper proposes a self - training approach to identify the boundaries of Chinese open - domain named entities in context. Due to the lack of training data, the paper firstly generates a large scale Chinese proper noun corpus based on parallel corpora, and also transforms a Chinese dependency tree bank to a noun compound training corpus. Subsequently, the paper proposes a self - training - based approach to combine the two corpora and train a model to identify boundaries of named entities. The experiments show the proposed method can take full advantage of the two corpora and improve the performance of named entity boundary identification.
出处 《智能计算机与应用》 2014年第4期1-4,8,共5页 Intelligent Computer and Applications
基金 国家自然科学基金(61133012 61273321) 国家高技术研究发展计划(863)前沿技术研究项目(2012AA011102)
关键词 开放域命名实体识别 自学习 训练语料融合 Open- domain Named Entity Recognition Self- training Training Corpus Combination
  • 相关文献

参考文献1

二级参考文献1

共引文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部