摘要
微阵列数据具有样本小、维度高的特点,给数据分析带来了困难。因此,在生物信息学的研究和应用中,从微阵列数据里挑选主基因(特征选取)是十分重要和有意义的。本文采用基于最优正交质心特征选取算法(OCFS)来挑选主基因,并与基于信噪比的主基因挑选法和基于遗传算法的主基因挑选法进行了对比。利用挑选出的主基因,采用支持向量机(SVM)对数据样本进行了分类研究。通过实验,在经典的白血病数据集上,对于34个样本的测试集,达到了33/34的分类准确率,表明了本方法的适用性。
With the development of DNA microarray technology, thousands of gene expressions can be observed simultaneously. Microarray data has the feature of high dimensions and small samples, which brings difficulty to the analysis. It is important and meaningful to select or discover informative genes from mlcroarray data. This paper employs an optimal orthogonal centroid feature selection algorithm (OCFS) to select the informative genes and compares it with gene selection method based on signal noise ratio and gene selection method based on genetic algorithm. Finally, the support vector machine (SVM) is used to classify the data set. This method is applied to a classic microarray data set (leukemia data) and achieved 33/34 classification accuracy on the test data set with 34 samples.
出处
《华东理工大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2007年第2期233-237,共5页
Journal of East China University of Science and Technology
基金
国家自然科学基金(60373075)
关键词
最优正交质心
特征选取
特征萃取
DNA微阵列
支持向量机
optimal othogonal centroid
feature selection
feature extraction
DNA mlcroarray
support vector machine