摘要
流形学习算法能否成功应用依赖于邻域大小参数的选取是否合适,但该参数在实际中通常难以高效选取。为此,提出一种邻域大小参数的递增式选取方法。按照流形的局部欧氏性,邻域图上的所有邻域都呈线性或近似线性,邻域大小参数若合适,此时所有邻域的线性度量可聚成一类;而邻域大小参数若不合适,邻域图上就会有部分邻域不再线性,其线性度量也不能聚成一类。对邻域图上的每一个邻域执行加权主成分分析,用重建误差对其线性程度进行度量,并计算相应的贝叶斯信息准则,以探测其聚类个数,从而实现对邻域大小参数的递增式选取。实验结果表明,该方法无需任何额外参数,具有较高的运行效率。
The success of manifold learning algorithms depends greatly upon selecting a suitable neighborhood size parameter,however,it is an open problem how to do this efficiently.To solve this problem,this paper proposes an efficient method to incrementally select a suitable neighborhood size.According to the local Euclidean property of the manifold,that all the neighborhoods in the neighborhood graph are linear or almost linear is the basis to think the corresponding neighborhood size suitable,when their linearity measures can remain small and fall into one cluster.However,once the neighborhood size becomes unsuitable,some neighborhoods are nonlinear,and their linearity measures can not fall into one cluster any more.So,this method runs the weighted Principal Component Analysis (PCA) on each neighborhood in the neighborhood graph,to obtain its reconstruction error as its linearity measure,and computes the corresponding Bayesian Information Criterion (BIC) to detect the number of clusters of all the reconstruction errors in the neighborhood graph,by which the neighborhood size can be selected incrementally.Experimental results that this method does not require any extra parameter,and has high run efficiency.
出处
《计算机工程》
CAS
CSCD
2014年第8期194-200,共7页
Computer Engineering
基金
国家自然科学基金资助项目(61202285)
河南省基础与前沿技术研究基金资助项目(112300410201)
河南省教育厅科学技术研究重点基础研究计划基金资助项目(13B520899)
关键词
流形学习
邻域大小
局部欧氏性
加权主成分分析
重建误差
贝叶斯信息准则
manifold learning
neighborhood size
local Euclidean property
weighted Principal Component Analysis (PCA)
reconstruction error
Bayesian Information Criterion (BIC)