Tumor diagnosis by analyzing gene expression profiles becomes an interesting topic in bioinformatics and the main problem is to identify the genes related to a tumor. This paper proposes a rank sum method to identify ...Tumor diagnosis by analyzing gene expression profiles becomes an interesting topic in bioinformatics and the main problem is to identify the genes related to a tumor. This paper proposes a rank sum method to identify the re- lated genes based on the rank sum test theory in statistics. The tumor diagnosis system is constructed by the support vector machine (SVM) trained on the set of the related gene expression profiles. The experiments demonstrate that the constructed tumor diagnosis system with the rank sum method and SVM can reach an accuracy level of 96.2% on the colon data and 100% on the leukemia data.展开更多
Background: Gene co-expression and differential co-expression analysis has been increasingly used to study co- functional and co-regulatory biological mechanisms from large scale transcriptomics data sets. Methods: ...Background: Gene co-expression and differential co-expression analysis has been increasingly used to study co- functional and co-regulatory biological mechanisms from large scale transcriptomics data sets. Methods: In this study, we develop a nonparametric approach to identify hub genes and modules in a large co- expression network with low computational and memory cost, namely MRHCA. Results: We have applied the method to simulated transcriptomics data sets and demonstrated MRHCA can accurately identify hub genes and estimate size of co-expression modules. With applying MRHCA and differential co- expression analysis to E. coil and TCGA cancer data, we have identified significant condition specific activated genes in E. coil and distinct gene expression regulatory mechanisms between the cancer types with high copy number variation and small somatic mutations. Conclusion: Our analysis has demonstrated MRItCA can (i) deal with large association networks, (ii) rigorously assess statistical significance for hubs and module sizes, (iii) identify co-expression modules with low associations, (iv) detect small and significant modules, and (v) allow genes to be present in more than one modules, compared with existing methods.展开更多
文摘Tumor diagnosis by analyzing gene expression profiles becomes an interesting topic in bioinformatics and the main problem is to identify the genes related to a tumor. This paper proposes a rank sum method to identify the re- lated genes based on the rank sum test theory in statistics. The tumor diagnosis system is constructed by the support vector machine (SVM) trained on the set of the related gene expression profiles. The experiments demonstrate that the constructed tumor diagnosis system with the rank sum method and SVM can reach an accuracy level of 96.2% on the colon data and 100% on the leukemia data.
文摘Background: Gene co-expression and differential co-expression analysis has been increasingly used to study co- functional and co-regulatory biological mechanisms from large scale transcriptomics data sets. Methods: In this study, we develop a nonparametric approach to identify hub genes and modules in a large co- expression network with low computational and memory cost, namely MRHCA. Results: We have applied the method to simulated transcriptomics data sets and demonstrated MRHCA can accurately identify hub genes and estimate size of co-expression modules. With applying MRHCA and differential co- expression analysis to E. coil and TCGA cancer data, we have identified significant condition specific activated genes in E. coil and distinct gene expression regulatory mechanisms between the cancer types with high copy number variation and small somatic mutations. Conclusion: Our analysis has demonstrated MRItCA can (i) deal with large association networks, (ii) rigorously assess statistical significance for hubs and module sizes, (iii) identify co-expression modules with low associations, (iv) detect small and significant modules, and (v) allow genes to be present in more than one modules, compared with existing methods.