The transcription start site (TSS) region shows greater variability compared with other promoter elements. We are interested to search for its variability by using information content as a measure. We note in this s...The transcription start site (TSS) region shows greater variability compared with other promoter elements. We are interested to search for its variability by using information content as a measure. We note in this study that the variability is significant in the block of 5 nucleotides (nt) surrounding the TSS region compared with the block of 15 nt. This suggests that the actual region that may be involved is in the range of 5-10 nt in size. For Escherichia coli, we note that the information content from dinucleotide substitution matrices clearly shows a better discrimination, suggesting the presence of some correlations. However, for human this effect is much less, and for mouse it is practically absent. We can conclude that the presence of short-range correlations within the TSS region is species-dependent and is not universal. We further observe that there are other variable regions in the mitochondrial control element apart from TSS. It is also noted that effective comparisons can only be made on blocks, while single nucleotide comparisons do not give us any detectable signals.展开更多
In this paper we present NPEST, a novel tool for the analysis of expressed sequence tags (EST) distributions and transcription start site (TSS) prediction. This method estimates an unknown probability distribution...In this paper we present NPEST, a novel tool for the analysis of expressed sequence tags (EST) distributions and transcription start site (TSS) prediction. This method estimates an unknown probability distribution of ESTs using a maximum likelihood (ML) approach, which is then used to predict positions of TSS. Accurate identification of TSS is an important genomics task, since the position of regulatory elements with respect to the TSS can have large effects on gene regulation, and performance of promoter motif-finding methods depends on correct identification of TSSs. Our probabilistic approach expands recognition capabilities to multiple TSS per locus that may be a useful tool to enhance the understanding of alternative splicing mechanisms. This paper presents analysis of simulated data as well as statistical analysis of promoter regions of a model dicot plant Arabidopsis thaliana. Using our statistical tool we analyzed 16520 loci and developed a database of TSS, which is now publicly available at www.glaeombio.net/NPEST.展开更多
文摘The transcription start site (TSS) region shows greater variability compared with other promoter elements. We are interested to search for its variability by using information content as a measure. We note in this study that the variability is significant in the block of 5 nucleotides (nt) surrounding the TSS region compared with the block of 15 nt. This suggests that the actual region that may be involved is in the range of 5-10 nt in size. For Escherichia coli, we note that the information content from dinucleotide substitution matrices clearly shows a better discrimination, suggesting the presence of some correlations. However, for human this effect is much less, and for mouse it is practically absent. We can conclude that the presence of short-range correlations within the TSS region is species-dependent and is not universal. We further observe that there are other variable regions in the mitochondrial control element apart from TSS. It is also noted that effective comparisons can only be made on blocks, while single nucleotide comparisons do not give us any detectable signals.
文摘In this paper we present NPEST, a novel tool for the analysis of expressed sequence tags (EST) distributions and transcription start site (TSS) prediction. This method estimates an unknown probability distribution of ESTs using a maximum likelihood (ML) approach, which is then used to predict positions of TSS. Accurate identification of TSS is an important genomics task, since the position of regulatory elements with respect to the TSS can have large effects on gene regulation, and performance of promoter motif-finding methods depends on correct identification of TSSs. Our probabilistic approach expands recognition capabilities to multiple TSS per locus that may be a useful tool to enhance the understanding of alternative splicing mechanisms. This paper presents analysis of simulated data as well as statistical analysis of promoter regions of a model dicot plant Arabidopsis thaliana. Using our statistical tool we analyzed 16520 loci and developed a database of TSS, which is now publicly available at www.glaeombio.net/NPEST.