摘要
针对高维小样本大噪声的基因芯片数据,提出一种基于主元分析与k-近邻距离的特征基因选择与去噪方法.首先利用主元分析法获取低维投影空间中的模式特征,依据各个基因贡献率大小排序,选择贡献率大的基因为特征基因,进而利用k-近邻距离来消除野值噪声以获得稳定高效的分类精度.实验结果表明:提出的特征基因选择与去噪方法,使得特征基因分类精度更高、性能更稳定.
For the feature gene selection and noise reduction of gene chip data with high dimensionality, small sample size and large noise, in this paper, a novel approach that based on PCA and k - nearest neighbor distance (k - DNN) is proposed. Specifically, the PCA method is employed to catch the mode feature in the lower dimensional projection space. The contribution value of each gene of the principal loadings is summed; all genes are ranked by their contribution value. Genes with the top number largest contribution are selected as the feature genes (FG). And then, the k - nearest neighbor distance is applied for removing the outliers so that the classification accuracy can become more stable and efficient. The experimental results have showed that our approach is able to make the FGs achieve the higher classification accuracy and more stable performance.
出处
《福州大学学报(自然科学版)》
CAS
CSCD
北大核心
2013年第1期49-52,共4页
Journal of Fuzhou University(Natural Science Edition)
基金
教育部博士点新教师基金资助项目(20113514120007)
福建省自然科学基金资助项目(2010J05132)
福建省教育厅科研资助项目(JA10034)
关键词
基因表达谱
特征基因选择
主元分析
K-近邻
去噪
microarray gene expression
feature gene selection
PCA
k -nearest neighbor
noise reduction