摘要
本文研究了细菌的蛋白质多肽组分统计特征与基因组GC(Guanine+Cytosine)含量的相关性,发现当多肽长度较小时多肽组分特异性与GC含量存在着很强的关联;随着多肽长度增加,上述关联发生突变,关联迅速丧失.这一结果表明,基于组分特异性确定细菌亲缘关系的方法的确给出了不同于GC含量的信息,从而能实现有效分类.
In the past decades, a lot of methods have been proposed to construct Genome Tree. Among them, K-String Composition Approach which is Alignment-Free shows nonnegligible superiority. On the other hand, the species specificity of GC (Guanine+Cytosine)-content which actually is the lowest-order version of K-String Composition has been discovered for a long time, especially in bacteria. Unfortunately, its resolution is too poor to be applied to reconstruct phylogeny. Motivated by those facts, in this paper, relationship between composition vector of peptides and GC-content of corresponding DNA sequence is studied for bacteria. A strong correlation is uncovered for short peptides, and with the increase of peptide length the correlation exhibits an abrupt change, that is, tends to vanish quickly. These results indicate that the composition vector of longer peptide do contains more precise information of species specificity than that of GC-content, and therefore can effectively measure the genetic relationship of bacteria. Short peptides are obviously not competent.
出处
《中国科学:物理学、力学、天文学》
CSCD
北大核心
2015年第5期10-17,共8页
Scientia Sinica Physica,Mechanica & Astronomica
基金
国家自然科学基金(批准号:11147020)
中央高校基本科研业务费专项资金(编号:GK201102028)资助项目