摘要
半监督学习是一种结合监督学习与无监督学习的学习方法,通过利用未标记数据,提高标记数据所建立模型的效果,目的是减少传统的机器学习任务中对大量标注数据的需求、降低人工成本。在中文电子病历实体识别领域,由于缺少足够的标注数据,且医学文本专业性较强、人工标注成本高,可以利用半监督学习方法,提升少量标注数据的训练效果。本文介绍了中文电子病历实体识别的研究背景和半监督学习的相关研究,并应用改进后的Tri-Training算法,提升中文电子病历实体识别模型的效果。
Semi-supervised learning is a method of machine learning combining supervised learning with unsupervised learning. It improves the result of model established by the labeled data with the use of unlabeled data,aiming to reduce the need of large amount of labeled data and the labor cost. In the field of Named Entity Recognition( NER) of Chinese electronic medical records,semi-supervised learning could be used to improve the training result of a few labeled data,due to the lack of enough labeled data,the professionality of medical texts and the high cost of manual annotation. This paper introduces the background of NER in Chinese electronic medical records and related researches of semi-supervised learning,and applies the improved Tri-Training algorithm to improve the effect of NER model of Chinese electronic medical records.
出处
《智能计算机与应用》
2017年第6期132-134,138,共4页
Intelligent Computer and Applications