The advances accelerated by next-generation sequencing and long-read sequencing technologies continue to provide an impetus for plant phylogenetic study.In the past decade,a large number of phylogenetic studies adopti...The advances accelerated by next-generation sequencing and long-read sequencing technologies continue to provide an impetus for plant phylogenetic study.In the past decade,a large number of phylogenetic studies adopting hundreds to thousands of genes across a wealth of clades have emerged and ushered plant phylogenetics and evolution into a new era.In the meantime,a roadmap for researchers when making decisions across different approaches for their phylogenomic research design is imminent.This review focuses on the utility of genomic data(from organelle genomes,to both reduced representation sequencing and whole-genome sequencing) in phylogenetic and evolutionary investigations,describes the baseline methodology of experimental and analytical procedures,and summarizes recent progress in flowering plant phylogenomics at the ordinal,familial,tribal,and lower levels.We also discuss the challenges,such as the adverse impact on orthology inference and phylogenetic reconstruction raised from systematic errors,and underlying biological factors,such as whole-genome duplication,hybridization/introgression,and incomplete lineage sorting,together suggesting that a bifurcating tree may not be the best model for the tree of life.Finally,we discuss promising avenues for future plant phylogenomic studies.展开更多
Cotton(Gossypium)stands as a crucial economic crop,serving as the primary source of naturalfiber for the textile sector.However,the evolutionary mechanisms driving speciation within the Gossypium genus remain unresolv...Cotton(Gossypium)stands as a crucial economic crop,serving as the primary source of naturalfiber for the textile sector.However,the evolutionary mechanisms driving speciation within the Gossypium genus remain unresolved.In this investigation,we leveraged 25 Gossypium genomes and introduced four novel assem-blies—G.harknessii,G.gossypioides,G.trilobum,and G.klotzschianum(Gklo)—to delve into the speciation history of this genus.Notably,we encountered intricate phylogenies potentially stemming from introgres-sion.These complexities are further compounded by incomplete lineage sorting(ILS),a factor likely to have been instrumental in shaping the swift diversification of cotton.Our focus subsequently shifted to the rapid radiation episode during a concise period in Gossypium evolution.For a recently diverged lineage comprising G.davidsonii,Gklo,and G.raimondii,we constructed afinely detailed ILS map.Intriguingly,this analysis revealed the non-random distribution of ILS regions across the reference Gklo genome.Moreover,we identified signs of robust natural selection influencing specific ILS regions.Noteworthy variations per-taining to speciation emerged between the closely related sister species Gklo and G.davidsonii.Approxi-mately 15.74%of speciation structural variation genes and 12.04%of speciation-associated genes were esti-mated to intersect with ILS signatures.Thesefindings enrich our understanding of the role of ILS in adaptive radiation,shedding fresh light on the intricate speciation history of the Gossypium genus.展开更多
Incomplete lineage sorting and introgression are 2 major and nonexclusive causes of specieslevel non-monophyly.Distinguishing between these 2 processes is notoriously difficult because they can generate similar geneti...Incomplete lineage sorting and introgression are 2 major and nonexclusive causes of specieslevel non-monophyly.Distinguishing between these 2 processes is notoriously difficult because they can generate similar genetic signatures.Previous studies have suggested that 2 closely related duck species,the Chinese spot-billed duck Anas zonorhyncha and the mallard A.platyrhynchosvjere polyphyletically intermixed.Here,we utilized a wide geographical sampling,multilocus data and a coalescent-based model to revisit this system.Our study confirms the finding that Chinese spot-billed ducks and Mallards are not monophyletic.There was no apparent interspecific differentiation across loci except those at the mitochondrial DNA(mtDNA)control region and the Z chromosome(CHD1Z).Based on an isolation-with-migration model and the geographical distribution of lineages,we suggest that both introgression and incomplete lineage sorting might contribute to the observed non-monophyly of the 2 closely related duck species.The mtDNA introgression was asymmetric,with high gene flow from Chinese spot-billed ducks to Mallards and negligible gene flow in the opposite direction.Given that the 2 duck species are phenotypically distinctive but weakly genetically differentiated,future work based on genomescale data is necessary to uncover genomic regions that are involved in divergence,and this work may provide further insights into the evolutionary histories of the 2 species and other waterfowls.展开更多
Redwood trees(Sequoioideae),including Metasequoia glyptostroboides(dawn redwood),Sequoiadendron giganteum(giant sequoia),and Sequoia sempervirens(coast redwood),are threatened and widely recognized iconic tree species...Redwood trees(Sequoioideae),including Metasequoia glyptostroboides(dawn redwood),Sequoiadendron giganteum(giant sequoia),and Sequoia sempervirens(coast redwood),are threatened and widely recognized iconic tree species.Genomic resources for redwood trees could provide clues to their evolutionary relationships.Here,we report the 8-Gb reference genome of M.glyptostroboides and a comparative analysis with two related species.More than 62%of the M.glyptostroboides genome is composed of repetitive sequences.Clade-specific bursts of long terminal repeat retrotransposons may have contributed to genomic differentiation in the three species.The chromosomal synteny between M.glyptostroboides and S.giganteum is extremely high,whereas there has been significant chromosome reorganization in S.sempervirens.Phylogenetic analysis of marker genes indicates that S.sempervirens is an autopolyploid,and more than 48%of the gene trees are incongruent with the species tree.Results of multiple analyses suggest that incomplete lineage sorting(ILS)rather than hybridization explains the inconsistent phylogeny,indicating that genetic variation among redwoods may be due to random retention of polymorphisms in ancestral populations.Functional analysis of ortholog groups indicates that gene families of ion channels,tannin biosynthesis enzymes,and transcription factors for meristem maintenance have expanded in S.giganteum and S.sempervirens,which is consistent with their extreme height.As a wetland-tolerant species,M.glyptostroboides shows a transcriptional response to flooding stress that is conserved with that of analyzed angiosperm species.Our study offers insights into redwood evolution and adaptation and provides genomic resources to aid in their conservation and management.展开更多
Background: Genetic admixture refers to the process or consequence of interbreeding between two or more previously isolated populations within a species. Compared to many other evolutionary driving forces such as mut...Background: Genetic admixture refers to the process or consequence of interbreeding between two or more previously isolated populations within a species. Compared to many other evolutionary driving forces such as mutations, genetic drift, and natural selection, genetic admixture is a quick mechanism for shaping population genomie diversity. In particular, admixture results in "recombination" of genetic variants that have been fixed in different populations, which has many evolutionary and medical implications. Results: However, it is challenging to accurately reconstruct population admixture history and to understand of population admixture dynamics. In this review, we provide an overview of models, methods, and tools for ancestry inference and admixture analysis. Conclusions: Many methods and tools used for admixture analysis were originally developed to analyze human data, but these methods can also be directly applied and/or slightly modified to study non-human species as well.展开更多
Background: Previous phylogenetic studies that include the four recognized species of Gallus have resulted in a number of distinct topologies, with little agreement. Several factors could lead to the failure to conver...Background: Previous phylogenetic studies that include the four recognized species of Gallus have resulted in a number of distinct topologies, with little agreement. Several factors could lead to the failure to converge on a consistent topology, including introgression, incomplete lineage sorting, different data types, or insufficient data.Methods: We generated three novel whole genome assemblies for Gallus species, which we combined with data from the published genomes of Gallus gallus and Bambusicola thoracicus(a member of the sister genus to Gallus). To determine why previous studies have failed to converge on a single topology, we extracted large numbers of orthologous exons, introns, ultra-conserved elements, and conserved non-exonic elements from the genome assemblies. This provided more than 32 million base pairs of data that we used for concatenated maximum likelihood and multispecies coalescent analyses of Gallus.Results: All of our analyses, regardless of data type, yielded a single, well-supported topology. We found some evidence for ancient introgression involving specific Gallus lineages as well as modest data type effects that had an impact on support and branch length estimates in specific analyses. However, the estimated gene tree spectra for all data types had a relatively good fit to their expectation given the multispecies coalescent.Conclusions: Overall, our data suggest that conflicts among previous studies probably reflect the use of smaller datasets(both in terms of number of sites and of loci) in those analyses. Our results demonstrate the importance of sampling large numbers of loci, each of which has a sufficient number of sites to provide robust estimates of gene trees. Low-coverage whole genome sequencing, as we did here, represents a cost-effective means to generate the very large data sets that include multiple data types that enabled us to obtain a robust estimate of Gallus phylogeny.展开更多
Although the effects of the coalescent process on sequence divergence and genealogies are well understood, the vir- tual majority of studies that use molecular sequences to estimate times of divergence among species h...Although the effects of the coalescent process on sequence divergence and genealogies are well understood, the vir- tual majority of studies that use molecular sequences to estimate times of divergence among species have failed to account for the coalescent process. Here we study the impact of ancestral population size and incomplete lineage sorting on Bayesian estimates of species divergence times under the molecular clock when the inference model ignores the coalescent process. Using a combination of mathematical analysis, computer simulations and analysis of real data, we find that the errors on estimates of times and the molecular rate can be substantial when ancestral populations are large and when there is substantial incomplete lineage sorting. For example, in a simple three-species case, we find that if the most precise fossil calibration is placed on the root of the phylogeny, the age of the internal node is overestimated, while if the most precise calibration is placed on the internal node, then the age of the root is underestimated. In both cases, the molecular rate is overestimated. Using simulations on a phylogeny of nine species, we show that substantial errors in time and rate estimates can be obtained even when dating ancient divergence events. We analyse the hominoid phylogeny and show that estimates of the neutral mutation rate obtained while ignoring the coalescent are too high. Using a coalescent-based technique to obtain geological times of divergence, we obtain estimates of the mutation rate that are within experimental estimates and we also obtain substantially older divergence times within the phylogeny [Current Zoology 61 (5): 874-885, 2015].展开更多
Environmentally heterogeneous mountains provide opportunities for rapid diversification and speciation.The family Prunellidae(accentors)is a group of birds comprising primarily mountain specialists that have recently ...Environmentally heterogeneous mountains provide opportunities for rapid diversification and speciation.The family Prunellidae(accentors)is a group of birds comprising primarily mountain specialists that have recently radiated across the Palearctic region.This rapid diversification poses challenges to resolving their phylogeny.Herein we sequenced the complete mitogenomes and estimated the phylogeny using all 12(including 28 individuals)currently recognized species of Prunellidae.We reconstructed the mitochondrial genome phylogeny using 13 protein-coding genes of 12 species and 2 Eurasian Tree Sparrows(Passer montanus).Phylogenetic relationships were estimated using a suite of analyses:maximum likelihood,maximum parsimony and the coalescent-based SVDquartets.Divergence times were estimated by implementing a Bayesian relaxed clock model in BEAST2.Based on the BEAST time-calibrated tree,we implemented an ancestral area reconstruction using RASP v.4.3.Our phylogenies based on the maximum likelihood,maximum parsimony and SVDquartets approaches support a clade of large-sized accentors(subgenus Laiscopus)to be sister to all other accentors with small size(subgenus Prunella).In addition,the trees also support the sister relationship of P.immaculata and P.rubeculoides+P.atrogularis with 100%bootstrap support,but the relationships among the remaining eight species in the Prunella clade are poorly resolved.These species cluster in different positions in the three phylogenetic trees and the nodes are often poorly supported.The five nodes separating the seven species diverged simultaneously within less than half million years(i.e.,between 2.71 and 3.15 million years ago),suggesting that the recent radiation is likely responsible for rampant incomplete lineage sorting and gene tree conflicts.Ancestral area reconstruction indicates a central Palearctic region origin for Prunellidae.Our study highlights that whole mitochondrial genome phylogeny can resolve major lineages within Prunellidae but is not sufficient to fully resolve t展开更多
基金supported by the Priority Research Program of the Chinese Academy of Sciences (CAS) (Grant No.XDB31000000)Large-scale Scientific Facilities of the CAS (Grant No.2017LSF-GBOWS-2)。
文摘The advances accelerated by next-generation sequencing and long-read sequencing technologies continue to provide an impetus for plant phylogenetic study.In the past decade,a large number of phylogenetic studies adopting hundreds to thousands of genes across a wealth of clades have emerged and ushered plant phylogenetics and evolution into a new era.In the meantime,a roadmap for researchers when making decisions across different approaches for their phylogenomic research design is imminent.This review focuses on the utility of genomic data(from organelle genomes,to both reduced representation sequencing and whole-genome sequencing) in phylogenetic and evolutionary investigations,describes the baseline methodology of experimental and analytical procedures,and summarizes recent progress in flowering plant phylogenomics at the ordinal,familial,tribal,and lower levels.We also discuss the challenges,such as the adverse impact on orthology inference and phylogenetic reconstruction raised from systematic errors,and underlying biological factors,such as whole-genome duplication,hybridization/introgression,and incomplete lineage sorting,together suggesting that a bifurcating tree may not be the best model for the tree of life.Finally,we discuss promising avenues for future plant phylogenomic studies.
基金the National Natural Science Foundation of China (32272090,32171994,and 32072023)the Central Plains Science and Technology Innovation Leader Project (214200510029 and 2022C01NY001)+1 种基金the Project of Sanya Yazhou Bay Science and Technology City (SCKY-JYRC-2022-88)the National Key R&D Program of China (2021YFE0101200)for financial support.
文摘Cotton(Gossypium)stands as a crucial economic crop,serving as the primary source of naturalfiber for the textile sector.However,the evolutionary mechanisms driving speciation within the Gossypium genus remain unresolved.In this investigation,we leveraged 25 Gossypium genomes and introduced four novel assem-blies—G.harknessii,G.gossypioides,G.trilobum,and G.klotzschianum(Gklo)—to delve into the speciation history of this genus.Notably,we encountered intricate phylogenies potentially stemming from introgres-sion.These complexities are further compounded by incomplete lineage sorting(ILS),a factor likely to have been instrumental in shaping the swift diversification of cotton.Our focus subsequently shifted to the rapid radiation episode during a concise period in Gossypium evolution.For a recently diverged lineage comprising G.davidsonii,Gklo,and G.raimondii,we constructed afinely detailed ILS map.Intriguingly,this analysis revealed the non-random distribution of ILS regions across the reference Gklo genome.Moreover,we identified signs of robust natural selection influencing specific ILS regions.Noteworthy variations per-taining to speciation emerged between the closely related sister species Gklo and G.davidsonii.Approxi-mately 15.74%of speciation structural variation genes and 12.04%of speciation-associated genes were esti-mated to intersect with ILS signatures.Thesefindings enrich our understanding of the role of ILS in adaptive radiation,shedding fresh light on the intricate speciation history of the Gossypium genus.
基金the National Natural Science Foundation of China(No.31401969,31772480)the Natural Science Foundation of Jiangxi Province(No.20161BAB214158).
文摘Incomplete lineage sorting and introgression are 2 major and nonexclusive causes of specieslevel non-monophyly.Distinguishing between these 2 processes is notoriously difficult because they can generate similar genetic signatures.Previous studies have suggested that 2 closely related duck species,the Chinese spot-billed duck Anas zonorhyncha and the mallard A.platyrhynchosvjere polyphyletically intermixed.Here,we utilized a wide geographical sampling,multilocus data and a coalescent-based model to revisit this system.Our study confirms the finding that Chinese spot-billed ducks and Mallards are not monophyletic.There was no apparent interspecific differentiation across loci except those at the mitochondrial DNA(mtDNA)control region and the Z chromosome(CHD1Z).Based on an isolation-with-migration model and the geographical distribution of lineages,we suggest that both introgression and incomplete lineage sorting might contribute to the observed non-monophyly of the 2 closely related duck species.The mtDNA introgression was asymmetric,with high gene flow from Chinese spot-billed ducks to Mallards and negligible gene flow in the opposite direction.Given that the 2 duck species are phenotypically distinctive but weakly genetically differentiated,future work based on genomescale data is necessary to uncover genomic regions that are involved in divergence,and this work may provide further insights into the evolutionary histories of the 2 species and other waterfowls.
基金supported by the National Key Research and Development Program of China(2017YFD0600701).
文摘Redwood trees(Sequoioideae),including Metasequoia glyptostroboides(dawn redwood),Sequoiadendron giganteum(giant sequoia),and Sequoia sempervirens(coast redwood),are threatened and widely recognized iconic tree species.Genomic resources for redwood trees could provide clues to their evolutionary relationships.Here,we report the 8-Gb reference genome of M.glyptostroboides and a comparative analysis with two related species.More than 62%of the M.glyptostroboides genome is composed of repetitive sequences.Clade-specific bursts of long terminal repeat retrotransposons may have contributed to genomic differentiation in the three species.The chromosomal synteny between M.glyptostroboides and S.giganteum is extremely high,whereas there has been significant chromosome reorganization in S.sempervirens.Phylogenetic analysis of marker genes indicates that S.sempervirens is an autopolyploid,and more than 48%of the gene trees are incongruent with the species tree.Results of multiple analyses suggest that incomplete lineage sorting(ILS)rather than hybridization explains the inconsistent phylogeny,indicating that genetic variation among redwoods may be due to random retention of polymorphisms in ancestral populations.Functional analysis of ortholog groups indicates that gene families of ion channels,tannin biosynthesis enzymes,and transcription factors for meristem maintenance have expanded in S.giganteum and S.sempervirens,which is consistent with their extreme height.As a wetland-tolerant species,M.glyptostroboides shows a transcriptional response to flooding stress that is conserved with that of analyzed angiosperm species.Our study offers insights into redwood evolution and adaptation and provides genomic resources to aid in their conservation and management.
基金S.X. acknowledges financial support from the National Natural Science Foundation of China (NSFC) grant (Nos. 91331204 and 31711530221), the Strategic Priority Research Program (No. XDBI3040100) and Key Research Program of Frontier Sciences (No. QYZDJ-SSW-SYS009) of the Chinese Academy of Sciences (CAS), the National Science Fund for Distinguished Young Scholars (No. 31525014), and the Program of Shanghai Academic Research Leader (No. 16XD1404700) S.X. is Max-Planck Independent Research Group Leader and member of CAS Youth Innovation Promotion Association. S.X. also gratefully acknowledges the support of the National Program for Top-notch Young Innovative Talents of The "Wanren Jihua" Project. We thank LetPub (www.letpub.com) for its linguistic assistance during the preparation of this manuseript. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
文摘Background: Genetic admixture refers to the process or consequence of interbreeding between two or more previously isolated populations within a species. Compared to many other evolutionary driving forces such as mutations, genetic drift, and natural selection, genetic admixture is a quick mechanism for shaping population genomie diversity. In particular, admixture results in "recombination" of genetic variants that have been fixed in different populations, which has many evolutionary and medical implications. Results: However, it is challenging to accurately reconstruct population admixture history and to understand of population admixture dynamics. In this review, we provide an overview of models, methods, and tools for ancestry inference and admixture analysis. Conclusions: Many methods and tools used for admixture analysis were originally developed to analyze human data, but these methods can also be directly applied and/or slightly modified to study non-human species as well.
基金the Florida International University Tropics Program and the Susan S.Levine Trust.RTK and ELB also received support from the United States National Science Foundation(DEB-1118823 and DEB-1655683).
文摘Background: Previous phylogenetic studies that include the four recognized species of Gallus have resulted in a number of distinct topologies, with little agreement. Several factors could lead to the failure to converge on a consistent topology, including introgression, incomplete lineage sorting, different data types, or insufficient data.Methods: We generated three novel whole genome assemblies for Gallus species, which we combined with data from the published genomes of Gallus gallus and Bambusicola thoracicus(a member of the sister genus to Gallus). To determine why previous studies have failed to converge on a single topology, we extracted large numbers of orthologous exons, introns, ultra-conserved elements, and conserved non-exonic elements from the genome assemblies. This provided more than 32 million base pairs of data that we used for concatenated maximum likelihood and multispecies coalescent analyses of Gallus.Results: All of our analyses, regardless of data type, yielded a single, well-supported topology. We found some evidence for ancient introgression involving specific Gallus lineages as well as modest data type effects that had an impact on support and branch length estimates in specific analyses. However, the estimated gene tree spectra for all data types had a relatively good fit to their expectation given the multispecies coalescent.Conclusions: Overall, our data suggest that conflicts among previous studies probably reflect the use of smaller datasets(both in terms of number of sites and of loci) in those analyses. Our results demonstrate the importance of sampling large numbers of loci, each of which has a sufficient number of sites to provide robust estimates of gene trees. Low-coverage whole genome sequencing, as we did here, represents a cost-effective means to generate the very large data sets that include multiple data types that enabled us to obtain a robust estimate of Gallus phylogeny.
文摘Although the effects of the coalescent process on sequence divergence and genealogies are well understood, the vir- tual majority of studies that use molecular sequences to estimate times of divergence among species have failed to account for the coalescent process. Here we study the impact of ancestral population size and incomplete lineage sorting on Bayesian estimates of species divergence times under the molecular clock when the inference model ignores the coalescent process. Using a combination of mathematical analysis, computer simulations and analysis of real data, we find that the errors on estimates of times and the molecular rate can be substantial when ancestral populations are large and when there is substantial incomplete lineage sorting. For example, in a simple three-species case, we find that if the most precise fossil calibration is placed on the root of the phylogeny, the age of the internal node is overestimated, while if the most precise calibration is placed on the internal node, then the age of the root is underestimated. In both cases, the molecular rate is overestimated. Using simulations on a phylogeny of nine species, we show that substantial errors in time and rate estimates can be obtained even when dating ancient divergence events. We analyse the hominoid phylogeny and show that estimates of the neutral mutation rate obtained while ignoring the coalescent are too high. Using a coalescent-based technique to obtain geological times of divergence, we obtain estimates of the mutation rate that are within experimental estimates and we also obtain substantially older divergence times within the phylogeny [Current Zoology 61 (5): 874-885, 2015].
基金funded by the National Natural Science Foundation of China(NSFC32020103005)the Third Xinjiang Scientific Expedition and Research(XIKK)(2022xjkk0205)Second Tibetan Plateau Scientific Expedition and Research(2019QZKK0501)。
文摘Environmentally heterogeneous mountains provide opportunities for rapid diversification and speciation.The family Prunellidae(accentors)is a group of birds comprising primarily mountain specialists that have recently radiated across the Palearctic region.This rapid diversification poses challenges to resolving their phylogeny.Herein we sequenced the complete mitogenomes and estimated the phylogeny using all 12(including 28 individuals)currently recognized species of Prunellidae.We reconstructed the mitochondrial genome phylogeny using 13 protein-coding genes of 12 species and 2 Eurasian Tree Sparrows(Passer montanus).Phylogenetic relationships were estimated using a suite of analyses:maximum likelihood,maximum parsimony and the coalescent-based SVDquartets.Divergence times were estimated by implementing a Bayesian relaxed clock model in BEAST2.Based on the BEAST time-calibrated tree,we implemented an ancestral area reconstruction using RASP v.4.3.Our phylogenies based on the maximum likelihood,maximum parsimony and SVDquartets approaches support a clade of large-sized accentors(subgenus Laiscopus)to be sister to all other accentors with small size(subgenus Prunella).In addition,the trees also support the sister relationship of P.immaculata and P.rubeculoides+P.atrogularis with 100%bootstrap support,but the relationships among the remaining eight species in the Prunella clade are poorly resolved.These species cluster in different positions in the three phylogenetic trees and the nodes are often poorly supported.The five nodes separating the seven species diverged simultaneously within less than half million years(i.e.,between 2.71 and 3.15 million years ago),suggesting that the recent radiation is likely responsible for rampant incomplete lineage sorting and gene tree conflicts.Ancestral area reconstruction indicates a central Palearctic region origin for Prunellidae.Our study highlights that whole mitochondrial genome phylogeny can resolve major lineages within Prunellidae but is not sufficient to fully resolve t