To unravel the genetic mechanisms of disease and physiological traits,it requires comprehensive sequencing analysis of large sample size in Chinese populations.Here,we report the primary results of the Chinese Academy...To unravel the genetic mechanisms of disease and physiological traits,it requires comprehensive sequencing analysis of large sample size in Chinese populations.Here,we report the primary results of the Chinese Academy of Sciences Precision Medicine Initiative(CASPMI)project launched by the Chinese Academy of Sciences,including the de novo assembly of a northern Han reference genome(NH1.0)and whole genome analyses of 597 healthy people coming from most areas in China.Given the two existing reference genomes for Han Chinese(YH and HX1)were both from the south,we constructed NH1.0,a new reference genome from a northern individual,by combining the sequencing strategies of PacBio,10×Genomics,and Bionano mapping.Using this integrated approach,we obtained an N50 scaffold size of 46.63 Mb for the NH1.0 genome and performed a comparative genome analysis of NH1.0 with YH and HX1.In order to generate a genomic variation map of Chinese populations,we performed the whole-genome sequencing of 597 participants and identified 24.85 million(M)single nucleotide variants(SNVs),3.85 M small indels,and 106,382 structural variations.In the association analysis with collected phenotypes,we found that the T allele of rs1549293 in KAT8 significantly correlated with the waist circumference in northern Han males.Moreover,significant genetic diversity in MTHFR,TCN2,FADS1,and FADS2,which associate with circulating folate,vitamin B12,or lipid metabolism,was observed between northerners and southerners.Especially,for the homocysteine-increasing allele of rs1801133(MTHFR 677T),we hypothesize that there exists a “comfort”zone for a high frequency of 677T between latitudes of 35–45 degree North.Taken together,our results provide a high-quality northern Han reference genome and novel population-specific data sets of genetic variants for use in the personalized and precision medicine.展开更多
Genomic selection(GS)has been widely used in livestock,which greatly accelerated the genetic progress of complex traits.The population size was one of the significant factors affecting the prediction accuracy,while it...Genomic selection(GS)has been widely used in livestock,which greatly accelerated the genetic progress of complex traits.The population size was one of the significant factors affecting the prediction accuracy,while it was limited by the purebred population.Compared to directly combining two uncorrelated purebred populations to extend the reference population size,it might be more meaningful to incorporate the correlated crossbreds into reference population for genomic prediction.In this study,we simulated purebred offspring(PAS and PBS)and crossbred offspring(CAB)base on real genotype data of two base purebred populations(PA and PB),to evaluate the performance of genomic selection on purebred while incorporating crossbred information.The results showed that selecting key crossbred individuals via maximizing the expected genetic relationship(REL)was better than the other methods(individuals closet or farthest to the purebred population,CP/FP)in term of the prediction accuracy.Furthermore,the prediction accuracy of reference populations combining PA and CAB was significantly better only based on PA,which was similar to combine PA and PAS.Moreover,the rank correlation between the multiple of the increased relationship(MIR)and reliability improvement was 0.60-0.70.But for individuals with low correlation(Cor(Pi,PA or B),the reliability improvement was significantly lower than other individuals.Our findings suggested that incorporating crossbred into purebred population could improve the performance of genetic prediction compared with using the purebred population only.The genetic relationship between purebred and crossbred population is a key factor determining the increased reliability while incorporating crossbred population in the genomic prediction on pure bred individuals.展开更多
Horseshoe bats(genus Rhinolophus,family Rhinolophidae)represent an important group within chiropteran phylogeny due to their distinctive traits,including constant high-frequency echolocation,rapid karyotype evolution,...Horseshoe bats(genus Rhinolophus,family Rhinolophidae)represent an important group within chiropteran phylogeny due to their distinctive traits,including constant high-frequency echolocation,rapid karyotype evolution,and unique immune system.Advances in evolutionary biology,supported by high-quality reference genomes and comprehensive whole-genome data,have significantly enhanced our understanding of species origins,speciation mechanisms,adaptive evolutionary processes,and phenotypic diversity.However,genomic research and understanding of the evolutionary patterns of Rhinolophus are severely constrained by limited data,with only a single published genome of R.ferrumequinum currently available.In this study,we constructed a high-quality chromosome-level reference genome for the intermediate horseshoe bat(R.affinis).Comparative genomic analyses revealed potential genetic characteristics associated with virus tolerance in Rhinolophidae.Notably,we observed expansions in several immune-related gene families and identified various genes functionally associated with the SARS-CoV-2 signaling pathway,DNA repair,and apoptosis,which displayed signs of rapid evolution.In addition,we observed an expansion of the major histocompatibility complex class II(MHC-II)region and a higher copy number of the HLA-DQB2 gene in horseshoe bats compared to other chiropteran species.Based on whole-genome resequencing and population genomic analyses,we identified multiple candidate loci(e.g.,GLI3)associated with variations in echolocation call frequency across R.affinis subspecies.This research not only expands our understanding of the genetic characteristics of the Rhinolophus genus but also establishes a valuable foundation for future research.展开更多
Genotype imputation has become an indispensable part of genomic data analysis. In recent years, imputation based on a multi-breed reference population has received more attention, but the relevant studies are scarce i...Genotype imputation has become an indispensable part of genomic data analysis. In recent years, imputation based on a multi-breed reference population has received more attention, but the relevant studies are scarce in pigs. In this study, we used the Illumina Porcine SNP50 Bead Chip to investigate the variations of imputation accuracy with various influencing factors and compared the imputation performance of four commonly used imputation software programs. The results indicated that imputation accuracy increased as either the validation population marker density, reference population sample size, or minor allele frequency(MAF) increased. However, the imputation accuracy would have a certain extent of decrease when the pig reference population was a mixed group of multiple breeds or lines. Considering both imputation accuracy and running time, Beagle 4.1 and FImpute are excellent choices among the four software packages tested. This work visually presents the impacts of these influencing factors on imputation and provides a reference for formulating reasonable imputation strategies in actual pig breeding.展开更多
基金supported by the grants of Key Program of the Chinese Academy of Sciences(Grant No.KJZD-EW-L14 awarded to CZ)the National Key R&D Program of China from the Ministry of Science and Technology of China(Grant No.2016YFB0201702 awarded to JX,as well as Grant Nos.2016YFC0901701 and 2018YFC0910700 awarded to XF)
文摘To unravel the genetic mechanisms of disease and physiological traits,it requires comprehensive sequencing analysis of large sample size in Chinese populations.Here,we report the primary results of the Chinese Academy of Sciences Precision Medicine Initiative(CASPMI)project launched by the Chinese Academy of Sciences,including the de novo assembly of a northern Han reference genome(NH1.0)and whole genome analyses of 597 healthy people coming from most areas in China.Given the two existing reference genomes for Han Chinese(YH and HX1)were both from the south,we constructed NH1.0,a new reference genome from a northern individual,by combining the sequencing strategies of PacBio,10×Genomics,and Bionano mapping.Using this integrated approach,we obtained an N50 scaffold size of 46.63 Mb for the NH1.0 genome and performed a comparative genome analysis of NH1.0 with YH and HX1.In order to generate a genomic variation map of Chinese populations,we performed the whole-genome sequencing of 597 participants and identified 24.85 million(M)single nucleotide variants(SNVs),3.85 M small indels,and 106,382 structural variations.In the association analysis with collected phenotypes,we found that the T allele of rs1549293 in KAT8 significantly correlated with the waist circumference in northern Han males.Moreover,significant genetic diversity in MTHFR,TCN2,FADS1,and FADS2,which associate with circulating folate,vitamin B12,or lipid metabolism,was observed between northerners and southerners.Especially,for the homocysteine-increasing allele of rs1801133(MTHFR 677T),we hypothesize that there exists a “comfort”zone for a high frequency of 677T between latitudes of 35–45 degree North.Taken together,our results provide a high-quality northern Han reference genome and novel population-specific data sets of genetic variants for use in the personalized and precision medicine.
基金supported by the earmarked fund for China Agriculture Research System(CARS-35)the National Natural Science Foundation of China(32022078)supported by the National Supercomputer Centre in Guangzhou。
文摘Genomic selection(GS)has been widely used in livestock,which greatly accelerated the genetic progress of complex traits.The population size was one of the significant factors affecting the prediction accuracy,while it was limited by the purebred population.Compared to directly combining two uncorrelated purebred populations to extend the reference population size,it might be more meaningful to incorporate the correlated crossbreds into reference population for genomic prediction.In this study,we simulated purebred offspring(PAS and PBS)and crossbred offspring(CAB)base on real genotype data of two base purebred populations(PA and PB),to evaluate the performance of genomic selection on purebred while incorporating crossbred information.The results showed that selecting key crossbred individuals via maximizing the expected genetic relationship(REL)was better than the other methods(individuals closet or farthest to the purebred population,CP/FP)in term of the prediction accuracy.Furthermore,the prediction accuracy of reference populations combining PA and CAB was significantly better only based on PA,which was similar to combine PA and PAS.Moreover,the rank correlation between the multiple of the increased relationship(MIR)and reliability improvement was 0.60-0.70.But for individuals with low correlation(Cor(Pi,PA or B),the reliability improvement was significantly lower than other individuals.Our findings suggested that incorporating crossbred into purebred population could improve the performance of genetic prediction compared with using the purebred population only.The genetic relationship between purebred and crossbred population is a key factor determining the increased reliability while incorporating crossbred population in the genomic prediction on pure bred individuals.
基金supported by the China Postdoctoral Science Foundation(2022M722020)to Z.L.Key Project of Scientific Research Program of Shaanxi Provincial Education Department(23JY020)to Z.L.+5 种基金Natural Science Basic Research Program of Shaanxi(2024JCYBMS-152)to Z.L.Key Projects of Shaanxi University of Technology(SLGKYXM2302)to Z.L.Opening Foundation of Shaanxi University of Technology(SLGPT2019KF02-02)to Z.L.Natural Science Basic Research Program of Shaanxi(2020JM-280)to G.L.Fundamental Research Funds for the Central Universities(GK201902008)to G.LNational Natural Science Foundation of China(31570378)to X.M.
文摘Horseshoe bats(genus Rhinolophus,family Rhinolophidae)represent an important group within chiropteran phylogeny due to their distinctive traits,including constant high-frequency echolocation,rapid karyotype evolution,and unique immune system.Advances in evolutionary biology,supported by high-quality reference genomes and comprehensive whole-genome data,have significantly enhanced our understanding of species origins,speciation mechanisms,adaptive evolutionary processes,and phenotypic diversity.However,genomic research and understanding of the evolutionary patterns of Rhinolophus are severely constrained by limited data,with only a single published genome of R.ferrumequinum currently available.In this study,we constructed a high-quality chromosome-level reference genome for the intermediate horseshoe bat(R.affinis).Comparative genomic analyses revealed potential genetic characteristics associated with virus tolerance in Rhinolophidae.Notably,we observed expansions in several immune-related gene families and identified various genes functionally associated with the SARS-CoV-2 signaling pathway,DNA repair,and apoptosis,which displayed signs of rapid evolution.In addition,we observed an expansion of the major histocompatibility complex class II(MHC-II)region and a higher copy number of the HLA-DQB2 gene in horseshoe bats compared to other chiropteran species.Based on whole-genome resequencing and population genomic analyses,we identified multiple candidate loci(e.g.,GLI3)associated with variations in echolocation call frequency across R.affinis subspecies.This research not only expands our understanding of the genetic characteristics of the Rhinolophus genus but also establishes a valuable foundation for future research.
基金supported by the China Agriculture Research System of MOF and MARA(CARS-35)the National Natural Science Foundation of China(32072696,31790414 and 31601916)the Fundamental Research Funds for the Central Universities(2662019PY011)。
文摘Genotype imputation has become an indispensable part of genomic data analysis. In recent years, imputation based on a multi-breed reference population has received more attention, but the relevant studies are scarce in pigs. In this study, we used the Illumina Porcine SNP50 Bead Chip to investigate the variations of imputation accuracy with various influencing factors and compared the imputation performance of four commonly used imputation software programs. The results indicated that imputation accuracy increased as either the validation population marker density, reference population sample size, or minor allele frequency(MAF) increased. However, the imputation accuracy would have a certain extent of decrease when the pig reference population was a mixed group of multiple breeds or lines. Considering both imputation accuracy and running time, Beagle 4.1 and FImpute are excellent choices among the four software packages tested. This work visually presents the impacts of these influencing factors on imputation and provides a reference for formulating reasonable imputation strategies in actual pig breeding.