CRISPR/Cas9 genome targeting systems have been applied to a variety of species. However, most CRISPR/Cas9 systems reported for plants can only modify one or a few target sites. Here, we report a robust CRISPR/Cas9 vec...CRISPR/Cas9 genome targeting systems have been applied to a variety of species. However, most CRISPR/Cas9 systems reported for plants can only modify one or a few target sites. Here, we report a robust CRISPR/Cas9 vector system, utilizing a plant codon optimized Cas9 gene, for convenient and high- efficiency multiplex genome editing in monocot and dicot plants. We designed PCR-based procedures to rapidly generate multiple sgRNA expression cassettes, which can be assembled into the binary CRISPR/ Cas9 vectors in one round of cloning by Golden Gate ligation or Gibson Assembly. With this system, we edi- ted 46 target sites in rice with an average 85.4% rate of mutation, mostly in biallelic and homozygous status. We reasoned that about 16% of the homozygous mutations in rice were generated through the non-homol- ogous end-joining mechanism followed by homologous recombination-based repair. We also obtained uni- form biallelic, heterozygous, homozygous, and chimeric mutations in Arabidopsis T1 plants. The targeted mutations in both rice and Arabidopsis were heritable. We provide examples of loss-of-function gene mu- tations in To rice and T1Arabidopsis plants by simultaneous targeting of multiple (up to eight) members of a gene family, multiple genes in a biosynthetic pathway, or multiple sites in a single gene. This system has provided a versatile toolbox for studying functions of multiple genes and gene families in plants for basic research and genetic improvement.展开更多
The genome sequence of the Severe Acute Respiratory Syndrome (SARS)-associated virus provides essential information for the identification of pathogen(s), exploration of etiology and evolution, interpretation of trans...The genome sequence of the Severe Acute Respiratory Syndrome (SARS)-associated virus provides essential information for the identification of pathogen(s), exploration of etiology and evolution, interpretation of transmission and pathogenesis, development of diagnostics, prevention by future vaccination, and treatment by developing new drugs. We report the complete genome sequence and comparative analysis of an isolate (BJ01) of the coronavirus that has been recognized as a pathogen for SARS. The genome is 29725 nt in size and has 11 ORFs (Open Reading Frames). It is composed of a stable region encoding an RNA-dependent RNA polymerase (composed of 2 ORFs) and a variable region representing 4 CDSs (coding sequences) for viral structural genes (the S, E, M, N proteins) and 5 PUPs (putative uncharacterized proteins). Its gene order is identical to that of other known coronaviruses. The sequence alignment with all known RNA viruses places this virus as a member in the family of Coronaviridae. Thirty putative substitutions have been identified by comparative analysis of the 5 SARS- associated virus genome sequences in GenBank. Fifteen of them lead to possible amino acid changes (non-synonymous mutations) in the proteins. Three amino acid changes, with predicted alteration of physical and chemical features, have been detected in the S protein that is postulated to beinvolved in the immunoreactions between the virus and its host. Two amino acid changes have been detected in the Mprotein, which could be related to viral envelope formation. Phylogenetic analysis suggests the possibility of non-human origin of the SARS-associated viruses but provides noevidence that they are man-made. Further efforts should focus on identifying the etiology of the SARS-associated virus and ruling out conclusively the existence of otherpossible SARS-related pathogen(s).展开更多
Tea is the world's oldest and most popular caffeine-containing beverage with immense economic, medicinal, and cultural importance. Here, we present the first high-quality nucleotide sequence of the repeat-rich (80.9...Tea is the world's oldest and most popular caffeine-containing beverage with immense economic, medicinal, and cultural importance. Here, we present the first high-quality nucleotide sequence of the repeat-rich (80.9%), 3.02-Gb genome of the cultivated tea tree Camellia sinensis. We show that an extraordinarily large genome size of tea tree is resulted from the slow, steady, and long-term amplification of a few LTR retrotransposon families. In addition to a recent whole-genome duplication event, lineage-specific expansions of genes associated with flavonoid metabolic biosynthesis were discovered, which enhance catechin production, terpene enzyme activation, and stress tolerance, important features for tea flavor and adaptation. We demonstrate an independent and rapid evolution of the tea caffeine synthesis pathway relative to cacao and coffee. A comparative study among 25 Camellia species revealed that higher expression levels of most flavonoid- and caffeinebut not theanine-related genes contribute to the increased production of catechins and caffeine and thus enhance tea-processing suitability and tea quality. These novel findings pave the way for further metabolomic and functional genomic refinement of characteristic biosynthesis pathways and will help develop a more diversified set of tea flavors that would eventually satisfy and attract more tea drinkers worldwide.展开更多
Precise and straightforward methods to edit the plant genome are much needed for functional genomics and crop improvement. Recently, RNA-guided genome editing using bacterial Type II cluster regularly interspaced shor...Precise and straightforward methods to edit the plant genome are much needed for functional genomics and crop improvement. Recently, RNA-guided genome editing using bacterial Type II cluster regularly interspaced short palindromic repeats (CRISPR)-associated nuclease (Cas) is emerging as an efficient tool for genome editing in microbial and animal systems. Here, we report the genome editing and targeted gene mutation in plants via the CRISPR-Cas9 sys- tem. Three guide RNAs (gRNAs) with a 20-22-nt seed region were designed to pair with distinct rice genomic sites which are followed by the protospacer-adjacent motif (PAM). The engineered gRNAs were shown to direct the Cas9 nuclease for precise cleavage at the desired sites and introduce mutation (insertion or deletion) by error-prone non-homologous end joining DNA repairing. By analyzing the RNA-guided genome-editing events, the mutation efficiency at these target sites was estimated to be 3-8%. In addition, the off-target effect of an engineered gRNA-Cas9 was found on an imper- fectly paired genomic site, but it had lower genome-editing efficiency than the perfectly matched site. Further analysis suggests that mismatch position between gRNA seed and target DNA is an important determinant of the gRNA-Cas9 tar- geting specificity, and specific gRNAs could be designed to target more than 90% of rice genes. Our results demonstrate that the CRISPR-Cas system can be exploited as a powerful tool for gene targeting and precise genome editing in plants.展开更多
DNA double-strand breaks (DSBs) are critical lesions that can result in cell death or a wide variety of genetic alterations including largeor small-scale deletions, loss of heterozygosity, translocations, and chromo...DNA double-strand breaks (DSBs) are critical lesions that can result in cell death or a wide variety of genetic alterations including largeor small-scale deletions, loss of heterozygosity, translocations, and chromosome loss. DSBs are repaired by non-homologous end-joining (NHEJ) and homologous recombination (HR), and defects in these pathways cause genome instability and promote tumorigenesis. DSBs arise from endogenous sources including reactive oxygen species generated during cellular metabolism, collapsed replication forks, and nucleases, and from exogenous sources including ionizing radiation and chemicals that directly or indirectly damage DNA and are commonly used in cancer therapy. The DSB repair pathways appear to compete for DSBs, but the balance between them differs widely among species, between different cell types of a single species, and during different cell cycle phases of a single cell type. Here we review the regulatory factors that regulate DSB repair by NHEJ and HR in yeast and higher eukaryotes. These factors include regulated expression and phosphorylation of repair proteins, chromatin modulation of repair factor accessibility, and the availability of homologous repair templates. While most DSB repair proteins appear to function exclusively in NHEJ or HR, a number of proteins influence both pathways, including the MRE11/RAD50/NBS1(XRS2) complex, BRCA1, histone H2AX, PARP-1, RAD18, DNA-dependent protein kinase catalytic subunit (DNA-PKcs), and ATM. DNA-PKcs plays a role in mammalian NHEJ, but it also influences HR through a complex regulatory network that may involve crosstalk with ATM, and the regulation of at least 12 proteins involved in HR that are phosphorylated by DNA-PKcs and/or ATM.展开更多
The Genome Sequence Archive(GSA)is a data repository for archiving raw sequence data,which provides data storage and sharing services for worldwide scientific communities.Considering explosive data growth with diverse...The Genome Sequence Archive(GSA)is a data repository for archiving raw sequence data,which provides data storage and sharing services for worldwide scientific communities.Considering explosive data growth with diverse data types,here we present the GSA family by expanding into a set of resources for raw data archive with different purposes,namely,GSA(https://ngdc.cncb.ac.cn/gsa/),GSA for Human(GSA-Human,https://ngdc.cncb.ac.cn/gsa-human/),and Open Archive for Miscellaneous Data(OMIX,https://ngdc.cncb.ac.cn/omix/).Compared with the 2017 version,GSA has been significantly updated in data model,online functionalities,and web interfaces.GSA-Human,as a new partner of GSA,is a data repository specialized in human genetics-related data with controlled access and security.OMIX,as a critical complement to the two resources mentioned above,is an open archive for miscellaneous data.Together,all these resources form a family of resources dedicated to archiving explosive data with diverse types,accepting data submissions from all over the world,and providing free open access to all publicly available data in support of worldwide research activities.展开更多
Dendrobium officinale Kimura et Migo is a traditional Chinese orchid herb that has both ornamental value and a broad range of therapeutic effects. Here, we report the first de novo assembled 1.35 Gb genome se- quences...Dendrobium officinale Kimura et Migo is a traditional Chinese orchid herb that has both ornamental value and a broad range of therapeutic effects. Here, we report the first de novo assembled 1.35 Gb genome se- quences for D. officinale by combining the second-generation Illumina Hiseq 2000 and third-generation PacBio sequencing technologies. We found that orchids have a complete inflorescence gene set and have some specific inflorescence genes. We observed gene expansion in gene families related to fungus symbiosis and drought resistance. We analyzed biosynthesis pathways of medicinal components of D. officinale and found extensive duplication of SPS and SuSy genes, which are related to polysaccharide generation, and that the pathway of D. officinale alkaloid synthesis could be extended to generate 16- epivellosimine. The D. officinale genome assembly demonstrates a new approach to deciphering large complex genomes and, as an important orchid species and a traditional Chinese medicine, the D. officinale genome will facilitate future research on the evolution of orchid plants, as well as the study of medicinal components and potential genetic breeding of the dendrobe.展开更多
Tartary buckwheat (Fagopyrum tataricum) is an important pseudocereal crop that is strongly adapted to growth in adverse environments. Its gluten-free grain contains complete proteins with a well-balanced composition...Tartary buckwheat (Fagopyrum tataricum) is an important pseudocereal crop that is strongly adapted to growth in adverse environments. Its gluten-free grain contains complete proteins with a well-balanced composition of essential amino acids and is a rich source of beneficial phytochemicals that provide significant health benefits. Here, we report a high-quality, chromosome-scale Tartary buckwheat genome sequence of- 489.3 Mb that is assembled by combining whole-genome shotgun sequencing of both Illumina short reads and single-molecule real-time long reads, sequence tags of a large DNA insert fosmid library, Hi-C sequencing data, and BioNano genome maps. We annotated 33 366 high-confidence protein-coding genes based on expression evidence. Comparisons of the intra-genome with the sugar beet genome revealed an independent whole-genome duplication that occurred in the buckwheat lineage after they diverged from the common ancestor, which was not shared with rosids or asterids. The reference genome facilitated the identification of many new genes predicted to be involved in rutin biosynthesis and regulation, aluminum stress resistance, and in drought and cold stress responses. Our data suggest that Tartary buckwheat's ability to tolerate high levels of abiotic stress is attributed to the expansion of several gene families involved in signal transduction, gene regulation, and membrane transport. The availability of these genomic resources will facilitate the discovery of agronomically and nutritionally important genes and genetic improvement of Tartary buckwheat.展开更多
Tea plant is an important economic crop,which is used to produce the world's oldest and most widely consumed tea beverages.Here,we present a high-quality reference genome assembly of the tea plant(Camellia sinensi...Tea plant is an important economic crop,which is used to produce the world's oldest and most widely consumed tea beverages.Here,we present a high-quality reference genome assembly of the tea plant(Camellia sinensis var.sinensis)consisting of 15 pseudo-chromosomes.LTR retrotransposons(LTR-RTs)account for 70.38%of the genome,and we present evidence that LTR-RTS play critical roles in genome size expansion and the transcriptional diversification of tea plant genes through preferential insertion in promoter regions and introns.Genes,particularly those coding for terpene biosynthesis pro-teins,associated with tea aroma and stress resistance were significantly amplified through recent tandem duplications and exist as gene clusters in tea plant genome.Phylogenetic analysis of the sequences of 81 tea plant accessions with diverse origins revealed three well-differentiated tea plant populations,support-ing the proposition for the southwest origin of the Chinese cultivated tea plant and its later spread to western Asia through introduction.Domestication and modern breeding left significant signatures on hundreds of genes in the tea plant genome,particularly those associated with tea quality and stress resis-tance.The genomic sequences of the reported reference and resequenced tea plant accessions provide valuable resources for future functional genomics study and molecular breeding of improved cul-tivars of tea plants.展开更多
文摘CRISPR/Cas9 genome targeting systems have been applied to a variety of species. However, most CRISPR/Cas9 systems reported for plants can only modify one or a few target sites. Here, we report a robust CRISPR/Cas9 vector system, utilizing a plant codon optimized Cas9 gene, for convenient and high- efficiency multiplex genome editing in monocot and dicot plants. We designed PCR-based procedures to rapidly generate multiple sgRNA expression cassettes, which can be assembled into the binary CRISPR/ Cas9 vectors in one round of cloning by Golden Gate ligation or Gibson Assembly. With this system, we edi- ted 46 target sites in rice with an average 85.4% rate of mutation, mostly in biallelic and homozygous status. We reasoned that about 16% of the homozygous mutations in rice were generated through the non-homol- ogous end-joining mechanism followed by homologous recombination-based repair. We also obtained uni- form biallelic, heterozygous, homozygous, and chimeric mutations in Arabidopsis T1 plants. The targeted mutations in both rice and Arabidopsis were heritable. We provide examples of loss-of-function gene mu- tations in To rice and T1Arabidopsis plants by simultaneous targeting of multiple (up to eight) members of a gene family, multiple genes in a biosynthetic pathway, or multiple sites in a single gene. This system has provided a versatile toolbox for studying functions of multiple genes and gene families in plants for basic research and genetic improvement.
文摘The genome sequence of the Severe Acute Respiratory Syndrome (SARS)-associated virus provides essential information for the identification of pathogen(s), exploration of etiology and evolution, interpretation of transmission and pathogenesis, development of diagnostics, prevention by future vaccination, and treatment by developing new drugs. We report the complete genome sequence and comparative analysis of an isolate (BJ01) of the coronavirus that has been recognized as a pathogen for SARS. The genome is 29725 nt in size and has 11 ORFs (Open Reading Frames). It is composed of a stable region encoding an RNA-dependent RNA polymerase (composed of 2 ORFs) and a variable region representing 4 CDSs (coding sequences) for viral structural genes (the S, E, M, N proteins) and 5 PUPs (putative uncharacterized proteins). Its gene order is identical to that of other known coronaviruses. The sequence alignment with all known RNA viruses places this virus as a member in the family of Coronaviridae. Thirty putative substitutions have been identified by comparative analysis of the 5 SARS- associated virus genome sequences in GenBank. Fifteen of them lead to possible amino acid changes (non-synonymous mutations) in the proteins. Three amino acid changes, with predicted alteration of physical and chemical features, have been detected in the S protein that is postulated to beinvolved in the immunoreactions between the virus and its host. Two amino acid changes have been detected in the Mprotein, which could be related to viral envelope formation. Phylogenetic analysis suggests the possibility of non-human origin of the SARS-associated viruses but provides noevidence that they are man-made. Further efforts should focus on identifying the etiology of the SARS-associated virus and ruling out conclusively the existence of otherpossible SARS-related pathogen(s).
基金This work was supported by the project of Yunnan Innovation Team Project, the Hundreds Oversea Talents Program of Yunnan Province, the Top Talents Program of Yunnan Province (Grant 20080A009), the Key Project of the Natural Science Foundation of Yunnan Province (201401 PC00397), National Science Foundation of China (U0936603), Key Project of Natural Science Foundation of Yunnan Province (2008CC016), Frontier Grant of Kunming Institute of Botany, CAS (672705232515), Top Talents Program of Yunnan Province (20080A009), and Hundreds Talents Program of Chinese Academy of Sciences (CAS) (to L.G.).
文摘Tea is the world's oldest and most popular caffeine-containing beverage with immense economic, medicinal, and cultural importance. Here, we present the first high-quality nucleotide sequence of the repeat-rich (80.9%), 3.02-Gb genome of the cultivated tea tree Camellia sinensis. We show that an extraordinarily large genome size of tea tree is resulted from the slow, steady, and long-term amplification of a few LTR retrotransposon families. In addition to a recent whole-genome duplication event, lineage-specific expansions of genes associated with flavonoid metabolic biosynthesis were discovered, which enhance catechin production, terpene enzyme activation, and stress tolerance, important features for tea flavor and adaptation. We demonstrate an independent and rapid evolution of the tea caffeine synthesis pathway relative to cacao and coffee. A comparative study among 25 Camellia species revealed that higher expression levels of most flavonoid- and caffeinebut not theanine-related genes contribute to the increased production of catechins and caffeine and thus enhance tea-processing suitability and tea quality. These novel findings pave the way for further metabolomic and functional genomic refinement of characteristic biosynthesis pathways and will help develop a more diversified set of tea flavors that would eventually satisfy and attract more tea drinkers worldwide.
文摘Precise and straightforward methods to edit the plant genome are much needed for functional genomics and crop improvement. Recently, RNA-guided genome editing using bacterial Type II cluster regularly interspaced short palindromic repeats (CRISPR)-associated nuclease (Cas) is emerging as an efficient tool for genome editing in microbial and animal systems. Here, we report the genome editing and targeted gene mutation in plants via the CRISPR-Cas9 sys- tem. Three guide RNAs (gRNAs) with a 20-22-nt seed region were designed to pair with distinct rice genomic sites which are followed by the protospacer-adjacent motif (PAM). The engineered gRNAs were shown to direct the Cas9 nuclease for precise cleavage at the desired sites and introduce mutation (insertion or deletion) by error-prone non-homologous end joining DNA repairing. By analyzing the RNA-guided genome-editing events, the mutation efficiency at these target sites was estimated to be 3-8%. In addition, the off-target effect of an engineered gRNA-Cas9 was found on an imper- fectly paired genomic site, but it had lower genome-editing efficiency than the perfectly matched site. Further analysis suggests that mismatch position between gRNA seed and target DNA is an important determinant of the gRNA-Cas9 tar- geting specificity, and specific gRNAs could be designed to target more than 90% of rice genes. Our results demonstrate that the CRISPR-Cas system can be exploited as a powerful tool for gene targeting and precise genome editing in plants.
文摘DNA double-strand breaks (DSBs) are critical lesions that can result in cell death or a wide variety of genetic alterations including largeor small-scale deletions, loss of heterozygosity, translocations, and chromosome loss. DSBs are repaired by non-homologous end-joining (NHEJ) and homologous recombination (HR), and defects in these pathways cause genome instability and promote tumorigenesis. DSBs arise from endogenous sources including reactive oxygen species generated during cellular metabolism, collapsed replication forks, and nucleases, and from exogenous sources including ionizing radiation and chemicals that directly or indirectly damage DNA and are commonly used in cancer therapy. The DSB repair pathways appear to compete for DSBs, but the balance between them differs widely among species, between different cell types of a single species, and during different cell cycle phases of a single cell type. Here we review the regulatory factors that regulate DSB repair by NHEJ and HR in yeast and higher eukaryotes. These factors include regulated expression and phosphorylation of repair proteins, chromatin modulation of repair factor accessibility, and the availability of homologous repair templates. While most DSB repair proteins appear to function exclusively in NHEJ or HR, a number of proteins influence both pathways, including the MRE11/RAD50/NBS1(XRS2) complex, BRCA1, histone H2AX, PARP-1, RAD18, DNA-dependent protein kinase catalytic subunit (DNA-PKcs), and ATM. DNA-PKcs plays a role in mammalian NHEJ, but it also influences HR through a complex regulatory network that may involve crosstalk with ATM, and the regulation of at least 12 proteins involved in HR that are phosphorylated by DNA-PKcs and/or ATM.
基金supported by grants from National Key R&D Program of China(Grant No.2017YFC0907502 to ZZ)Strategic Priority Research Program of Chinese Academy of Sciences(Grant Nos.XDB38060100 and XDB38030200 to YB+13 种基金XDB38050300 to WZXDB38030400 to JXXDA19050302 to ZZ)National Key R&D Program of China(Grant Nos.2016YFC0901603 to WZ2017YFC1201202 to YW2020YFC0847000 and 2018YFD1000505 to WZ2016YFE0206600 to YB)The 13th Five-year Informatization Plan of Chinese Academy of Sciences(Grant No.XXH13505-05 to YB)Genomics Data Center Construction of Chinese Academy of Sciences(Grant No.XXH-13514-0202 to YB)Open Biodiversity and Health Big Data Programme of the International Union of Biological Sciences to YBThe Professional Association of the Alliance of International Science Organizations(Grant No.ANSO-PA-2020-07 to YB)National Natural Science Foundation of China(Grant Nos.32030021 and 31871328 to ZZ)International Partnership Program of the Chinese Academy of Sciences(Grant No.153F11KYSB20160008 to ZZ)。
文摘The Genome Sequence Archive(GSA)is a data repository for archiving raw sequence data,which provides data storage and sharing services for worldwide scientific communities.Considering explosive data growth with diverse data types,here we present the GSA family by expanding into a set of resources for raw data archive with different purposes,namely,GSA(https://ngdc.cncb.ac.cn/gsa/),GSA for Human(GSA-Human,https://ngdc.cncb.ac.cn/gsa-human/),and Open Archive for Miscellaneous Data(OMIX,https://ngdc.cncb.ac.cn/omix/).Compared with the 2017 version,GSA has been significantly updated in data model,online functionalities,and web interfaces.GSA-Human,as a new partner of GSA,is a data repository specialized in human genetics-related data with controlled access and security.OMIX,as a critical complement to the two resources mentioned above,is an open archive for miscellaneous data.Together,all these resources form a family of resources dedicated to archiving explosive data with diverse types,accepting data submissions from all over the world,and providing free open access to all publicly available data in support of worldwide research activities.
文摘Dendrobium officinale Kimura et Migo is a traditional Chinese orchid herb that has both ornamental value and a broad range of therapeutic effects. Here, we report the first de novo assembled 1.35 Gb genome se- quences for D. officinale by combining the second-generation Illumina Hiseq 2000 and third-generation PacBio sequencing technologies. We found that orchids have a complete inflorescence gene set and have some specific inflorescence genes. We observed gene expansion in gene families related to fungus symbiosis and drought resistance. We analyzed biosynthesis pathways of medicinal components of D. officinale and found extensive duplication of SPS and SuSy genes, which are related to polysaccharide generation, and that the pathway of D. officinale alkaloid synthesis could be extended to generate 16- epivellosimine. The D. officinale genome assembly demonstrates a new approach to deciphering large complex genomes and, as an important orchid species and a traditional Chinese medicine, the D. officinale genome will facilitate future research on the evolution of orchid plants, as well as the study of medicinal components and potential genetic breeding of the dendrobe.
文摘Tartary buckwheat (Fagopyrum tataricum) is an important pseudocereal crop that is strongly adapted to growth in adverse environments. Its gluten-free grain contains complete proteins with a well-balanced composition of essential amino acids and is a rich source of beneficial phytochemicals that provide significant health benefits. Here, we report a high-quality, chromosome-scale Tartary buckwheat genome sequence of- 489.3 Mb that is assembled by combining whole-genome shotgun sequencing of both Illumina short reads and single-molecule real-time long reads, sequence tags of a large DNA insert fosmid library, Hi-C sequencing data, and BioNano genome maps. We annotated 33 366 high-confidence protein-coding genes based on expression evidence. Comparisons of the intra-genome with the sugar beet genome revealed an independent whole-genome duplication that occurred in the buckwheat lineage after they diverged from the common ancestor, which was not shared with rosids or asterids. The reference genome facilitated the identification of many new genes predicted to be involved in rutin biosynthesis and regulation, aluminum stress resistance, and in drought and cold stress responses. Our data suggest that Tartary buckwheat's ability to tolerate high levels of abiotic stress is attributed to the expansion of several gene families involved in signal transduction, gene regulation, and membrane transport. The availability of these genomic resources will facilitate the discovery of agronomically and nutritionally important genes and genetic improvement of Tartary buckwheat.
基金This work was supported by the National Key Research and Development Program of China(2018YFD1000601 and 2019YFD1001601)the National Natural Science Foundation of China(31800180)+2 种基金the Natural Science Foundation of Anhui Province of China(1908085MC75)the China Postdoctoral Science Foundation(2017M621992)and the special funds for tea germplasm garden construction(2060502 and 201834040003).
文摘Tea plant is an important economic crop,which is used to produce the world's oldest and most widely consumed tea beverages.Here,we present a high-quality reference genome assembly of the tea plant(Camellia sinensis var.sinensis)consisting of 15 pseudo-chromosomes.LTR retrotransposons(LTR-RTs)account for 70.38%of the genome,and we present evidence that LTR-RTS play critical roles in genome size expansion and the transcriptional diversification of tea plant genes through preferential insertion in promoter regions and introns.Genes,particularly those coding for terpene biosynthesis pro-teins,associated with tea aroma and stress resistance were significantly amplified through recent tandem duplications and exist as gene clusters in tea plant genome.Phylogenetic analysis of the sequences of 81 tea plant accessions with diverse origins revealed three well-differentiated tea plant populations,support-ing the proposition for the southwest origin of the Chinese cultivated tea plant and its later spread to western Asia through introduction.Domestication and modern breeding left significant signatures on hundreds of genes in the tea plant genome,particularly those associated with tea quality and stress resis-tance.The genomic sequences of the reported reference and resequenced tea plant accessions provide valuable resources for future functional genomics study and molecular breeding of improved cul-tivars of tea plants.