摘要
用终止密码方法计算了酵母、大肠杆菌和枯草杆菌基因组中所有的第一类开阅读框架(记为理论ORF),给出了理论ORF和已知ORF随长度的分布,发现长度大于150个氨基酸后,理论ORF与已知ORF分布基本趋于一致,小于150个氨基酸的理论ORF数目的对数随长度线性变化,并提出这些短ORF是随机产生的猜想;研究了组分约束下的随机DNA序列中ORF数目、ORF的长度与随机序列总长度和GC含量之间的关系,证明了本文猜想的正确性;给出了短的理论ORF中可能的编码序列所占比例的分布曲线,这对识别短的编码序列有参考价值。
Using the terminal codon method proposed by us, th e first kind of ORFs (denoted theoret-ical ORF) are predicted in yeast, Escherichia coli and Bacillus subtilis genomes. The theoretical ORF number and known ORF number verses its length are given. The two distribut ions are consistent with each other while ORF is length larger than 1 50 amino acids. There is a good linear relation between the logarithm of theoretical ORF numbers and its length for the theoretical ORF s horter than 150 amino acids. We suppose that the theoretical ORFs and their linear relation with the length of ORFs come from the randomne ss of the DNA sequences. The relation between ORF distribution and GC content, and be-tween ORF distribution and length for component-constr ained random sequences are analyzed. The results show that our supposi tion is correct. The ratio of the number of known ORF and short the oretical ORF are given. The relation may be useful for gene identific ation.
出处
《生物物理学报》
CAS
CSCD
北大核心
2002年第3期307-312,共6页
Acta Biophysica Sinica
基金
国家自然科学基金(10147204)
内蒙自然科学基金
关键词
酵母
大肠杆菌
枯草杆菌
基因组
短ORF
分布
形成原因
Yeast
Escherichia coli
Bacillus subtilis
Compone nt-constrained random DNA sequences
Open reading frame distribution