The influenza A viruses have three gene segments, M, NS, and PB1, which code for more than one protein. The overlapping genes from the same segment entail their interdependence, which could be reflected in the evoluti...The influenza A viruses have three gene segments, M, NS, and PB1, which code for more than one protein. The overlapping genes from the same segment entail their interdependence, which could be reflected in the evolutionary constraints, host distinction, and co-mutations of influenza. Most previous studies of overlapping genes focused on their unique evolutionary constraints, and very little was achieved to assess the potential impact of the overlap on other biological aspects of influenza. In this study, our aim was to explore the mutual dependence in host differentiation and co-mutations in M, NS, and PB1 of avian, human, 2009 H1N1, and swine viruses, with Random Forests, information entropy, and mutual information. The host markers and highly co-mutated individual sites and site pairs (P values < 0.035) in the three gene segments were identified with their relative significance between the overlapping genes calculated. Further, Random Forests predicted that among the three stop codons in the current PB1-F2 gene of 2009 H1N1, the significance of a mutation at these sites for host differentiation was, in order from most to least, that at 12, 58, and 88, i.e., the closer to the start of the gene the more important the mutation was. Finally, our sequence analysis surprisingly revealed that the full-length PB1-F2, if the three stop codons were all mutated, would function more as a swine protein than a human protein, although the PB1 of 2009 H1N1 was derived from human H3N2.展开更多
The disclosure of many secrets of the genetic code was facilitated by the fact that it was carried out on the basis of mathematical analysis of experimental data: the diversity of genes, their structures and genetic c...The disclosure of many secrets of the genetic code was facilitated by the fact that it was carried out on the basis of mathematical analysis of experimental data: the diversity of genes, their structures and genetic codes. New properties of the genetic code are presented and its most important integral characteristics are established. Two groups of such characteristics were distinguished. The first group refers to the integral characteristics for the areas of DNA, where genes are broken down in pairs and all 5 cases of overlap, allowed by the structure of DNA, were investigated. The second group of characteristics refers to the most extended areas of DNA in which there is no genetic overlap. The interrelation of the established integral characteristics in these groups is shown. As a result, a number of previously unknown effects were discovered. It was possible to establish two functions in which all the over-understood codons in mitochondrial genetic codes (human and other organizations) participate, as well as a significant difference in the integral characteristics of such codes compared to the standard code. Other properties of the structure of the genetic code following from the obtained results are also established. The obtained results allowed us to set and solve one of the new breakthrough problems—the calculation of the genetic code. The full version of the solution to this problem was published in this journal in August 2017.展开更多
One of the problems in the development of mathematical theory of the genetic code (summary is presented in [1], the detailed—to [2]) is the problem of the calculation of the genetic code. Similar problem in the world...One of the problems in the development of mathematical theory of the genetic code (summary is presented in [1], the detailed—to [2]) is the problem of the calculation of the genetic code. Similar problem in the world is unknown and could be delivered only in the 21st century. One approach to solving this problem is devoted to this work. For the first time a detailed description of the method of calculation of the genetic code was provided, the idea of which was first published earlier [3]), and the choice of one of the most important sets for the calculation was based on an article [4]. Such a set of amino acid corresponds to a complete set of representation of the plurality of overlapping triple gene belonging to the same DNA strand. A separate issue was the initial point, triggering an iterative search process all codes submitted by the initial data. Mathematical analysis has shown that the said set contains some ambiguities, which have been founded because of our proposed compressed representation of the set. As a result, the developed method of calculation was reduced to two main stages of research, where at the first stage only single-valued domains were used in the calculations. The proposed approach made it possible to significantly reduce the amount of computation at each step in this complex discrete structure.展开更多
文摘The influenza A viruses have three gene segments, M, NS, and PB1, which code for more than one protein. The overlapping genes from the same segment entail their interdependence, which could be reflected in the evolutionary constraints, host distinction, and co-mutations of influenza. Most previous studies of overlapping genes focused on their unique evolutionary constraints, and very little was achieved to assess the potential impact of the overlap on other biological aspects of influenza. In this study, our aim was to explore the mutual dependence in host differentiation and co-mutations in M, NS, and PB1 of avian, human, 2009 H1N1, and swine viruses, with Random Forests, information entropy, and mutual information. The host markers and highly co-mutated individual sites and site pairs (P values < 0.035) in the three gene segments were identified with their relative significance between the overlapping genes calculated. Further, Random Forests predicted that among the three stop codons in the current PB1-F2 gene of 2009 H1N1, the significance of a mutation at these sites for host differentiation was, in order from most to least, that at 12, 58, and 88, i.e., the closer to the start of the gene the more important the mutation was. Finally, our sequence analysis surprisingly revealed that the full-length PB1-F2, if the three stop codons were all mutated, would function more as a swine protein than a human protein, although the PB1 of 2009 H1N1 was derived from human H3N2.
文摘The disclosure of many secrets of the genetic code was facilitated by the fact that it was carried out on the basis of mathematical analysis of experimental data: the diversity of genes, their structures and genetic codes. New properties of the genetic code are presented and its most important integral characteristics are established. Two groups of such characteristics were distinguished. The first group refers to the integral characteristics for the areas of DNA, where genes are broken down in pairs and all 5 cases of overlap, allowed by the structure of DNA, were investigated. The second group of characteristics refers to the most extended areas of DNA in which there is no genetic overlap. The interrelation of the established integral characteristics in these groups is shown. As a result, a number of previously unknown effects were discovered. It was possible to establish two functions in which all the over-understood codons in mitochondrial genetic codes (human and other organizations) participate, as well as a significant difference in the integral characteristics of such codes compared to the standard code. Other properties of the structure of the genetic code following from the obtained results are also established. The obtained results allowed us to set and solve one of the new breakthrough problems—the calculation of the genetic code. The full version of the solution to this problem was published in this journal in August 2017.
文摘One of the problems in the development of mathematical theory of the genetic code (summary is presented in [1], the detailed—to [2]) is the problem of the calculation of the genetic code. Similar problem in the world is unknown and could be delivered only in the 21st century. One approach to solving this problem is devoted to this work. For the first time a detailed description of the method of calculation of the genetic code was provided, the idea of which was first published earlier [3]), and the choice of one of the most important sets for the calculation was based on an article [4]. Such a set of amino acid corresponds to a complete set of representation of the plurality of overlapping triple gene belonging to the same DNA strand. A separate issue was the initial point, triggering an iterative search process all codes submitted by the initial data. Mathematical analysis has shown that the said set contains some ambiguities, which have been founded because of our proposed compressed representation of the set. As a result, the developed method of calculation was reduced to two main stages of research, where at the first stage only single-valued domains were used in the calculations. The proposed approach made it possible to significantly reduce the amount of computation at each step in this complex discrete structure.