摘要
命名实体识别是信息抽取中的一项重要任务。在医疗研究领域,从电子病历中自动识别命名实体形成结构化的文本为医疗决策提供数据支持,已经成为重要的研究课题。分词和实体识别分步进行容易造成下层错误向上累加传递且不能充分利用融合信息。针对这一问题,本文提出一种两位一体字标注方法,该方法将识别过程看做是序列的字标注过程,采用条件随机场模型经过标注实现病历的命名实体识别。实验结果表明,两位一体字标注方法在命名实体识别中性能得到很大的提升。
Named entity recognition is an important task in information extraction. In the field of medical research, automatically identify named entities from the electronic medical records and form structured text to provide data support for medical decisions has become an important research topic. The step by step process of word segmentation and named entity recognition may cause the low level errors pass upwards accumulatively and fail to make full use of information fusion. In order to solve this problem, this paper proposes a unified character_based tagging approach. This method treats named entity recognition as a character sequence tagging, then, conditions random field model is adopted to realize named entity recognition. The experiment results show that unified character tagging method performance has been a lot of ascension.
出处
《中国卫生信息管理杂志》
2017年第4期552-556,共5页
Chinese Journal of Health Informatics and Management
关键词
命名实体识别
信息抽取
两位一体
字标注
条件随机场
Named entity recognition
Information extraction
Binity
Character_based tagging
Conditions random field model