摘要
基因和异构体表达水平的差异检测是获取基因和异构体功能的重要途径,目前差异检测已经是转录组研究中一个重要的研究方向.RNA-seq技术近年来被广泛用于差异基因的检测.为模拟读段的非均匀分布,通常采用负二项分布对读段计数进行建模.现存的负二项分布模型大都是直接对基因读段计数进行建模,不能进行差异异构体检测.提出基于PGseq模型计算出的基因和异构体表达水平的负二项分布模型,采用exact test方法进行差异分析,解决了异构体的差异检测的问题.经实验验证,该方法在基因和异构体两方面的差异检测中都具有较高的准确度和灵敏度.
High-throughput RNA sequencing(RNA-seq)has been widely applied in transcriptome analysis recently.One important research direction of transcriptome study is to detect differential expression(DE)of genes and isoforms.RNA-seq experiments produce counts of reads that are affected by biological and technical variation.To distinguish the systematic changes in expression between conditions from noise,the counts are frequently modeled by the Negative Binomial distribution.Most proposed methods using the Negative Binomial models are based on statistics that compare read counts between conditions.Unfortunately,because of read mapping ambiguity,it is difficult to exactly obtain the read counts for each isoform.As a result,these methods are not available for detecting DE isoforms.In this paper,we propose a method PGDiff to detect differential expression for both genes and isoforms,which is based on the Negative Binomial models of gene and isoform expression derived from package PGseq.Instead of modeling the distribution of whole counts for each gene,PGseq model the variability of count for each individual exon,and obtain the expression of each gene and each isoform.Unlike the count-based methods,PGDiff detect DE expression in two steps.The first step is to obtain the expressions of genes and isoforms.Then in the second step,we use exact test to detect the differential expression with the obtained expressions and the Negative Binomial models.Inthe aspect of detecting DE genes,we evaluated the proposed approach using MAQC dataset and Griffith dataset,and compared its performance with that of currently popular packages MMDiff,Cuffdiff,BitSeq,DESeq and baySeq.In the aspect of detecting DE isoforms,we designed two types of comparison using the human breast cancer dataset,and compared with packages Cuffdiff,BitSeq,and t-test method.For these datasets,the proposed method performed favorably in sensitivity and specificity at both the gene and isoform level.
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2016年第2期253-260,共8页
Journal of Nanjing University(Natural Science)
基金
国家自然科学基金(61170152)
中央高校基本科研业务费专项(CXZZ11_0217)
关键词
RNA-SEQ
差异基因
差异异构体
负二项分布
EXACT
TEST
RNA-seq
differential expression genes
differential expression isoforms
negative binomial distribution
exact test