摘要
半监督学习和主动学习都是利用未标记数据,在少量标记数据代价下同时提高监督学习识别性能的有效方法。为此,结合主动学习方法与半监督学习的Tri-training算法,提出一种新的分类算法,通过熵优先采样算法选择主动学习的样本。针对UCI数据集和遥感数据,在不同标记训练样本比例下进行实验,结果表明,该算法在标记样本数较少的情况下能取得较好的效果。将主动学习与Tri-training算法相结合,是提高分类性能和泛化性的有效途径。
Both semi-supervised learning and active learning attempt to exploit the unlabeled data to improve the recognition rate of supervised learning algorithms and minimize the cost of data labeling. So this paper proposes an algorithm to select samples in active learning such as Entropy Priority Sampling(EPS). It combines with the Tri-training algorithm and active learning method. Experimental results on both the UCI and image datasets under different proportion of marker training samples show that, this algorithm can obtain better result in the case of fewer labeled examples, and the combination of the active learning with semi-supervised learning is an effective way to improve the performance and generalization.
出处
《计算机工程》
CAS
CSCD
2014年第6期215-218,229,共5页
Computer Engineering
基金
云南省教育厅科研基金资助项目(2010Y290
2012C098)