摘要
支持向量机通过随机选择标记的训练样本进行有监督学习,随着信息容量的增加和数据收集能力的提高,这需要耗费大量的标记工作量,给实际应用带来不少困难。本文提出了基于最佳样本标记的主动支持向量机学习策略:首先利用无监督聚类选择一个小规模的样本集进行标记,然后训练该标记样本集得到一个初始SVM分类器,然后利用该分类器主动选择最感兴趣的无标记样本进行标记,逐渐增加标记样本的数量,并在此基础上更新分类器,反复进行直到得到最佳性能的分类器。实验结果表明在基本不影响分类精度的情况下,主动学习选择的标记样本数量大大低于随机选择的标记样本数量,这大大降低了标记的工作量,而且训练速度同样有所提高。
Support Vector machine is an effective supervised learning classifier by random selecting labeled samples, however it need label large-scale samples in actual large data application by manual works. This paper describes a active learning strategy for SVM. The learning strategy is motivated by the statistical query model and unsupervised clustering method. First the initial classifier with a small training set selected by unsupervised clustering operation, then prune the large training set with the initial classifier to query the informative unlabeled samples and add them into labeled set. New labeled set is used to update the classifier again and again until gain the expectation classifier performance. The experimental results show that the active SVM learning strategy provides the same accurate classification performance as the passive SVM classifier obtained by training large labeled set directly while minimizing the labeling effort.
出处
《信号处理》
CSCD
北大核心
2008年第1期105-107,共3页
Journal of Signal Processing
关键词
主动学习
核函数
支持向量机
被动学习
无监督聚类
active learning
kernel function
support vector machine
passive learning
unsupervised clustering