摘要
多视角特征选择通过融合多个视角的信息获取具有代表性的特征子集,来提高分类、聚类等学习任务的效率。然而,描述对象的特征繁杂多样且相互关联,单一地从原始特征中选择特征子空间可以简单地解决维度问题,但无法有效获取数据内部存在的结构信息和特征关联信息,且固定使用相似度矩阵和投影矩阵易损失视角间的相关性。针对以上问题,提出了基于相似度矩阵学习和矩阵校正的无监督多视角特征选择(SMLMA)算法。该算法首先构造所有视角的相似度矩阵,通过流形学习得到一致相似度矩阵以及投影矩阵,最大程度地发现和保留多视角数据的结构信息;其次采用矩阵校正的方法,最大化相似度矩阵和核矩阵之间的相关性,合理利用不同视角之间的关联性,减少特征子集的信息冗余;最后,采用Armijo搜索方法快速得到收敛结果。在4个实验数据集Caltech-7,NUS-WIDE-OBJ,Toy Animal和MSRC-v1上的实验结果表明,相比单视角特征选择和部分多视角特征选择方法,所提算法在聚类任务上的准确率平均提高了约7.54%。其较好地保留了数据的结构信息和多视角之间特征的相关性,捕获了更多高质量的特征。
Multi-view feature selection improves the efficiency of classification,clustering and other learning tasks by fusing information from multiple views to obtain representative feature subsets.However,the features of different views that describe objects are complex and interrelated.Simply searching subset of features from original space partly solves the problem of dimension,but it barely obtains the latent structural information and association information among features.Besides,using fixed similarity matrix and projection matrix is prone to lose the correlation between different views.To solve these problems,an unsupervised multi-view feature selection algorithm based on similarity matrix learning and matrix alignment(SMLMA)is proposed.Firstly,the similarity matrix based on all views is constructed,and the consistent similarity matrix and projection matrix are obtained by mani-fold learning,to explore and reserve the structural information of data to the greatest extent.Then,the matrix alignment method is used to maximize the correlation between the similarity matrix and the kernel matrix,for the purpose of using the correlation between different views and reducing the information redundancy of feature subset.Finally,the Armijo searching method is introduced to obtain the convergence result quickly.Experimental results on four datasets(Caltech-7,NUS-WIDE-OBJ,Toy Animal and MSRC-v1)show that,compared with single view feature selection and some multi-view feature selection methods,the accuracy of SMLMA is averagely improved by about 7.54%.The proposed algorithm well retains the structural information of data and the correlation between multi-view features,and captures more high-quality features.
作者
李斌
万源
LI Bin;WAN Yuan(School of Science,Wuhan University of Technology,Wuhan 430070,China)
出处
《计算机科学》
CSCD
北大核心
2022年第8期86-96,共11页
Computer Science
基金
中央高校基本科研业务费专项资金(2021III030JC)。
关键词
多视角
无监督
特征选择
相似度矩阵
矩阵校正
Multi-view
Unsupervised
Feature selection
Similarity matrix
Matrix alignment