摘要
针对传统主动学习(AL)方法对大规模的无标记样本分类收敛速度过慢的问题,提出了基于层次聚类(HC)的主动学习训练算法——HC_AL方法。通过对大规模的未标记数据进行层次聚类,并对每个层次上的类中心打标记来代替该层次上的类标记,然后将该层次上具有错误标记的类中心加入训练集。在数据集上的实验取得了较好的泛化能力和较快的收敛速度。实验结果表明通过采用分层细化、逐步求精的方法,可使主动学习的收敛速度大大提高,同时获得较为满意的学习能力。
Concerning the slow convergence speed of unlabeled samples classification while using the traditional Active Learning(AL) method to deal with the large-scale data,a Hierarchical Clustering Active Learning(HC_AL) algorithm was proposed.During operation in the algorithm,the majority of the unlabeled data were clustered hierarchically and the center of each cluster was labeled to replace the category label of this hierarchy.Then the wrong labeled data were added into the training data sets.The experimental results at the data sets show that the proposed algorithm improves the generalization ability and the convergence speed.Moreover,it can greatly improve the active learning convergence speed and obtain relatively satisfactory learning ability by using the method of hierarchical refinement and stepwise refinement.
出处
《计算机应用》
CSCD
北大核心
2011年第8期2134-2137,共4页
journal of Computer Applications
基金
山西省青年科学基金资助项目(2011021013-2)
关键词
主动学习
层次聚类
分层细化
逐步求精
Active Learning(AL)
Hierarchical Clustering(HC)
hierarchical refinement
stepwise refinement