摘要
基于基因表达谱提出了一种依据类加权Bhattacharyya距离提取特征基因并使用人工神经网络(ANN)进行肿瘤亚型识别的方法。分析了儿童小圆蓝细胞瘤(SRBCTs)的基因表达数据后,在训练样本集上计算出各个基因的类加权Bhattacharyya距离,并据此选择特征基因构造若干ANN模型,利用独立测试集验证其分类能力,且依据分类错误率最小的原则确定了含40个基因的特征基因组合。基于该特征基因组合的ANN模型不仅正确地识别了所有的患病样本的亚型,还能鉴别非患病样本。
An approach for cancer molecular classification based on their gene expression profiles was proposed. The gene expression data of SRBCTs were analysed and the weighted Bhattacharyya distance of each gene was used as the criterion for ranking genes in the training dataset. Then the artificial neural networks(ANNs) was trained using the expression data of dozens of top ranked genes and they were tested on the test samples. According to the minimum classification error,candidate informative genes combination including the 40 top ranked genes was established. Using these 40 genes 100% correct classification was obtained not only on the training samples but also on the test SRBCTs. To test the ability of the trained ANNs to recognize SRBCTs,blinded samples composed of 5 non-SRBCTs were analysed and they were not diagnosed because of low confidence.
出处
《计算机应用》
CSCD
北大核心
2004年第11期131-134,共4页
journal of Computer Applications
基金
国家自然科学基金重点资助项目 (6 0 2 3 40 2 0 )