摘要
针对高维数据难以被人们直观理解,且难以被机器学习和数据挖据算法有效地处理的问题,提出一种新的非线性降维方法——判别式扩散映射分析(DDMA)。该方法将判别核方案应用到扩散映射框架中,依据样本类别标签在类内窗宽和类间窗宽中判别选取高斯核窗宽,使核函数能够有效提取数据的关联特性,准确描述数据空间的结构特征。通过在人工合成Swiss-roll测试和青霉素发酵过程中的仿真应用,与主成分分析(PCA)、线性判别分析(LDA)、核主成分分析(KPCA)、拉普拉斯特征映射(LE)算法和扩散映射(DM)进行比较,实验结果表明DDMA方法在低维空间中代表高维数据的同时成功保留了数据的原始特性,且通过该方法在低维空间中产生的数据结构特性优于其他方法,在数据降维与特征提取性能上验证了该方案的有效性。
Aiming at that high-dimensional data is hard to be understood intuitively, and cannot be effectively processed by traditional machine learning and data mining techniques, a new method for nonlinear dimensionality reduction called Discriminant Diffusion Maps Analysis (DDMA) was proposed. It was implemented by applying a diseriminant kemel scheme to the framework of the diffusion maps. The Gaussian kernel window width was selected from the within-class width and the between-class width according to discriminating sample category labels, it made kernel function effectively extract data correlation features and exactly describe the structure characteristics of data space. The DDMA was used in artificial Swiss-roll test and penicillin fermentation process, with comparisons with Principle Component Analysis (PCA), Linear Discriminant Analysis (LDA), Kernel Principle Components Analysis (KPCA), Laplacian Eigenmaps (LE) and Diffusion Maps (DM). The results show that DDMA represents the high-dimensional data in a low-dimensional space while successfully retaining original characteristics of the data; in addition, the data structure features in low-dimensional space generated by DDMA are superior to those generated by the comparison methods, the performance of data dimension reduction and feature extraction verifies effectiveness of the proposed scheme.
出处
《计算机应用》
CSCD
北大核心
2015年第2期470-475,共6页
journal of Computer Applications
基金
国家自然科学基金资助项目(60774070
61174119)
国家自然科学基金重点课题资助项目(61034006)
关键词
扩散映射
非线性降维
判别核方案
类别标签
核函数
流形学习
Diffusion Maps (DM)
nonlinear diraensionality reduction
diseriminant kemel scheme
category label
kernel function
manifold learning