摘要
当两组样本间基因表达的差异程度较低或样本量较少时,采用通常的错误发现率(falsediscovery rate,FDR)控制水平(如5%或10%),可能无法识别足够多的差异表达基因以进行后续的功能富集分析。然而,功能富集分析对差异表达基因中的错误发现具有一定的稳健性。所以,采用较低的FDR控制水平(即允许较高的FDR)识别差异表达基因,可能可以可靠地发现疾病相关功能。本文分析了5套研究乳腺癌转移的基因表达谱,通过其中差异表达信号较强的3套数据,论证了即使差异表达基因的FDR达到25%,功能富集分析的结果仍具有较高的稳健性。然后,在另外2套差异表达信号微弱的数据中,采用25%的FDR控制水平筛选差异表达基因来进行功能富集分析,并与前述3套数据的功能富集结果做比较。结果显示,采用较低的FDR控制水平筛选差异表达基因,仍然可以可靠地识别乳腺癌转移相关功能。分析结果也提示,在乳腺癌转移过程中,一些功能较为宽泛的生物学过程(如细胞分裂、细胞周期和DNA复制等)整体受到了扰动,反映出乳腺癌转移是一种涉及广泛基因表达改变的系统性疾病。
When the degree of gene expression difference between two groups of samples is low or the number of samples is small, we are often unable to identify sufficient differentially expressed (DE) genes for the subsequent functional enrichment analysis at a conventional false discovery rate (FDR) control level (e.g. 5% or 10%). However, functional enrichment analysis is to some extent robust to the false discoveries of DE genes. Therefore, a low FDR control level that allows a high FDR of the selected DE genes may be used to reliably detect disease-associated functions. In this study, we analyzed five microarray datasets for studying the metastasis of breast cancer. Based on three of the five datasets with relatively strong signals of differential gene expression, we demonstrated that even when the FDR of the DE genes reached 25%, the functional enrichment analysis showed high robustness. Next, in the other two datasets with weak signals of differential gene expressions, we selected DE genes with FDR〈25% to perform the functional enrichment analysis, and compared the results with those obtained from the previous three datasets. The results showed that many functions associated with breast cancer metastasis could still be reliably identified using a low FDR control level for the identification of DE genes. The results also showed that a wide range of gene expressions in some general biological processes (such as cell division, cell cycle and DNA replication) might be altered during the course of breast cancer metastasis, reflecting that breast cancer metastasis is a 'systems disease' with global gene expression changes.
出处
《生物物理学报》
CAS
CSCD
北大核心
2012年第3期232-241,共10页
Acta Biophysica Sinica
基金
国家自然科学基金项目(30970668
31100901
81071646
91029717)
黑龙江省杰出青年基金项目(JC200808)~~
关键词
乳腺癌
转移
统计效能
差异表达基因
功能
Breast cancer
Metastasis
Statistical power
Differentially expressed gene
Function