摘要
如何有效利用海量的数据是当前机器学习面临的一个重要任务,传统的支持向量机是一种有监督的学习方法,需要大量有标记的样本进行训练,然而有标记样本的数量是十分有限的并且非常不易获取。结合Co-training算法与Tri-training算法的思想,给出了一种半监督SVM分类方法。该方法采用两个不同参数的SVM分类器对无标记样本进行标记,选取置信度高的样本加入到已标记样本集中。理论分析和计算机仿真结果都表明,文中算法能有效利用大量的无标记样本,并且无标记样本的加入能有效提高分类的正确率。
One of the important assignment in machine learning is how to use large-scale data effectively,the traditional SVM is a kind of supervised learning approach,it needs a number of labeled samples for training,but the labeled samples are limited and very difficult to obtain.A semi-supervised SVM for classification is proposed by binding the thoughts of Co-training and Tri-training together.This method uses two SVM classifiers with different parameters to label the unlabeled samples,then chooses the samples with high confidence level to extend the labeled sample-set.Both theoretical analysis and simulation results indicatethat this method can use a lot of unlabeled samples effectively, and the addition of unlabeled samples can improve classification accuracy availably.
出处
《计算机技术与发展》
2010年第10期115-117,121,共4页
Computer Technology and Development
基金
国家自然科学基金(40671133)
关键词
半监督学习
支持向量机
遗传算法
semi-supervised learning
support vector machine(SVM)
genetic algorithm(GA)