摘要
针对基因芯片数据量大、样本数低和基因维数高的特点,提出了一种对基因芯片数据进行多步骤降维处理的分类方法.第一步,采用基因表达差异显著性分析方法(SAM)筛选得到差异表达基因子集.第二步,采用支持向量机(SVM)分类器对该差异表达基因子集进行进一步的分类降维.将该方法用来处理大肠癌和白血病数据集,得到了数量较少而分类能力较强的特征基因子集.实验结果证明该方法可以快速有效地筛选肿瘤特征基因.
Microarray data has the characteristics of large quantity, low sample size and high gene dimensionality. To face this challenge, a multi-step dimensionality reduction method for classification of microarray data was proposed. In the first step, significant analysis of microarrays (SAM) was used to select a subset of differentially expressed genes (DEGs). In the second step, a step-by-step support vector machine (SVM) classification algorithm was applied to reduce gene dimen- sionaloty of the subset of DEGs. The strategy was evaluated over three datasets of colorectal cancer and leukemia, with smaller gene numbers and higher classification accuracy. The results demonstrated the usefulness and efficiency of the approach for selection of tumor feature genes.
出处
《复旦学报(自然科学版)》
CAS
CSCD
北大核心
2008年第4期541-544,共4页
Journal of Fudan University:Natural Science
关键词
基因芯片数据
特征基因选择
基因表达差异显著性分析方法
支持向量机
降维
microarray data
gene sdection
significant analysis of microarrays (SAM)
support vector machine (SVM)
dimensionality reduction