摘要
主动学习已经被证明是一种成功的机器学习算法,最主要的缺点是它只注重样本的标签信息而忽略了样本的分布信息。因此带来的后果就是稳定性差,容易陷入局部最优解,同时对初始样本的选择非常敏感。论文将稀疏子空间聚类与主动学习相结合,首先利用稀疏子空间聚类找到原始数据的分布信息,然后利用该信息指导主动学习选取初始样本,使样本标注更加有效,提高了主动学习的效率,同时降低了主动学习对初始样本的敏感度。最后通过多组仿真实验证明,本方法可以有效的改善主动学习的性能。
Active learning has been proved to be a successful machine learning algorithm,but it still has some disadvantages.The main drawback is that it only pays attention to the label information of samples and ignores the distribution information of samples.Therefore,the result is poor stability,easy to fall into the local optimal solution,and very sensitive to the selection of initial samples.A new algorithm is proved which combines sparse subspace clustering with active learning.Firstly,sparse subspace clustering is used to find the distribution information of the data set,and then the information is used to guide active learning to select initial samples,which makes sample labeling more effective,improves the efficiency of active learning,and reduces the sensitivity of active learning to initial samples.Finally,the simulation results show that this algorithm can effectively improve the performance of active learning.
作者
姜秀波
钟丽媛
宋曹根
JIANG Xiu-bo;ZHONG Li-yuan;SONG Cao-gen(Xuji Electric Energy Storage Technology Co.,Ltd,Xuchang,Henan 461000,China;Mitsubishi Electric Shanghai Electric Elevator Co.,Ltd,Shanghai 200230,China)
出处
《计算技术与自动化》
2021年第4期69-73,共5页
Computing Technology and Automation
关键词
主动学习
稀疏子空间
聚类
active learning
sparse subspace
clustering