Genome-wide association study(GWAS)and genomic prediction/selection(GP/GS)are the two essential enterprises in genomic research.Due to the great magnitude and complexity of genomic and phenotypic data,analytical metho...Genome-wide association study(GWAS)and genomic prediction/selection(GP/GS)are the two essential enterprises in genomic research.Due to the great magnitude and complexity of genomic and phenotypic data,analytical methods and their associated software packages are frequently advanced.GAPIT is a widely-used genomic association and prediction integrated tool as an R package.The first version was released to the public in 2012 with the implementation of the general linear model(GLM),mixed linear model(MLM),compressed MLM(CMLM),and genomic best linear unbiased prediction(g BLUP).The second version was released in 2016 with several new implementations,including enriched CMLM(ECMLM)and settlement of MLMs under progressively exclusive relationship(SUPER).All the GWAS methods are based on the single-locus test.For the first time,in the current release of GAPIT,version 3 implemented three multi-locus test methods,including multiple loci mixed model(MLMM),fixed and random model circulating probability unification(Farm CPU),and Bayesian-information and linkage-disequilibrium iteratively nested keyway(BLINK).Additionally,two GP/GS methods were implemented based on CMLM(named compressed BLUP;c BLUP)and SUPER(named SUPER BLUP;s BLUP).These new implementations not only boost statistical power for GWAS and prediction accuracy for GP/GS,but also improve computing speed and increase the capacity to analyze big genomic data.Here,we document the current upgrade of GAPIT by describing the selection of the recently developed methods,their implementations,and potential impact.All documents,including source code,user manual,demo data,and tutorials,are freely available at the GAPIT website(http://zzlab.net/GAPIT).展开更多
The first paradigm of plant breeding involves direct selection-based phenotypic observation,followed by predictive breeding using statistical models for quantitative traits constructed based on genetic experimental de...The first paradigm of plant breeding involves direct selection-based phenotypic observation,followed by predictive breeding using statistical models for quantitative traits constructed based on genetic experimental design and,more recently,by incorporation of molecular marker genotypes.However,plant performance or phenotype(P)is determined by the combined effects of genotype(G),envirotype(E),and genotype by environment interaction(GEI).Phenotypes can be predicted more precisely by training a model using data collected from multiple sources,including spatiotemporal omics(genomics,phenomics,and enviromics across time and space).Integration of 3D information profiles(G-P-E),each with multidimensionality,provides predictive breeding with both tremendous opportunities and great challenges.Here,we first review innovative technologies for predictive breeding.We then evaluate multidimensional information profiles that can be integrated with a predictive breeding strategy,particularly envirotypic data,which have largely been neglected in data collection and are nearly untouched in model construction.We propose a smart breeding scheme,integrated genomic-enviromic prediction(iGEP),as an extension of genomic prediction,using integrated multiomics information,big data technology,and artificial intelligence(mainly focused on machine and deep learning).We discuss how to implement iGEP,including spatiotemporal models,environmental indices,factorial and spatiotemporal structure of plant breeding data,and cross-species prediction.A strategy is then proposed for prediction-based crop redesign at both the macro(individual,population,and species)and micro(gene,metabolism,and network)scales.Finally,we provide perspectives on translating smart breeding into genetic gain through integrative breeding platforms and open-source breeding initiatives.We call for coordinated efforts in smart breeding through iGEP,institutional partnerships,and innovative technological support.展开更多
Genomic prediction is an effective way to accelerate the rate of agronomic trait improvement in plants.Traditional methods typically use linear regression models with clear assumptions;such methods are unable to captu...Genomic prediction is an effective way to accelerate the rate of agronomic trait improvement in plants.Traditional methods typically use linear regression models with clear assumptions;such methods are unable to capture the complex relationships between genotypes and phenotypes.Non-linear models(e.g.,deep neural networks)have been proposed as a superior alternative to linear models because they can capture complex non-additive effects.Here we introduce a deep learning(DL)method,deep neural network genomic prediction(DNNGP),for integration of multi-omics data in plants.We trained DNNGP on four datasets and compared its performance with methods built with five classic models:genomic best linear unbiased prediction(GBLUP);two methods based on a machine learning(ML)framework,light gradient boosting machine(LightGBM)and support vector regression(SVR);and two methods based on a DL framework,deep learning genomic selection(DeepGS)and deep learning genome-wide association study(DLGWAS).DNNGP is novel in five ways.First,it can be applied to a variety of omics data to predict phenotypes.Second,the multilayered hierarchical structure of DNNGP dynamically learns features from raw data,avoiding overfitting and improving the convergence rate using a batch normalization layer and early stopping and rectified linear activation(rectified linear unit)functions.Third,when small datasets were used,DNNGP produced results that are competitive with results from the other five methods,showing greater prediction accuracy than the other methods when large-scale breeding data were used.Fourth,the computation time required by DNNGP was comparable with that of commonly used methods,up to 10 times faster than DeepGS.Fifth,hyperparameters can easily be batch tuned on a local machine.Compared with GBLUP,LightGBM,SVR,DeepGS and DLGWAS,DNNGP is superior to these existing widely used genomic selection(GS)methods.Moreover,DNNGP can generate robust assessments from diverse datasets,including omics data,and quickly incorporate complex and large datas展开更多
With marker and phenotype information from observed populations, genomic selection (GS) can be used to establish associations between markers and phenotypes. It aims to use genome-wide markers to estimate the effect...With marker and phenotype information from observed populations, genomic selection (GS) can be used to establish associations between markers and phenotypes. It aims to use genome-wide markers to estimate the effects of all loci and thereby predict the genetic values of untested populations, so as to achieve more comprehensive and reliable selection and to accelerate genetic progress in crop breeding. GS models usually face the problem that the number of markers is much higher than the number of phenotypic observations. To overcome this issue and improve prediction accuracy, many models and algorithms, including GBLUP, Bayes, and machine learning have been employed for GS. As hot issues in GS research, the estimation of non-additive genetic effects and the combined analysis of multiple traits or multiple environments are also important for improving the accuracy of prediction. In recent years, crop breeding has taken advantage of the development of GS. The principles and characteristics of current popular GS methods and research progress in hese methods for crop improvement are reviewed in this paper.展开更多
Genomic selection (GS) and high-throughput phenotyping have recently been captivating the interest of the crop breeding com- munity from both the public and private sectors world-wide. Both approaches promise to rev...Genomic selection (GS) and high-throughput phenotyping have recently been captivating the interest of the crop breeding com- munity from both the public and private sectors world-wide. Both approaches promise to revolutionize the prediction of complex traits, including growth, yield and adaptation to stress. Whereas high-throughput phenotyping may help to improve understanding of crop physiology, most powerful techniques for high-throughput field phenotyping are empirical rather than analytical and compa- rable to genomic selection. Despite the fact that the two method- ological approaches represent the extremes of what is understood as the breeding process (phenotype versus genome), they both consider the targeted traits (e.g. grain yield, growth, phenology, plant adaptation to stress) as a black box instead of dissectingthem as a set of secondary traits (i.e. physiological) putatively related to the target trait. Both GS and high-throughput phenotyping have in common their empirical approach enabling breeders to use genome profile or phenotype without understanding the underlying biology. This short review discusses the main aspects of both approaches and focuses on the case of genomic selection of maize flowering traits and near-infrared spectroscopy (NIRS) and plant spectral reflectance as high-throughput field phenotyping methods for complex traits such as crop growth and yield.展开更多
基金partially funded by National Science Foundation,the United States(Grant Nos.DBI 1661348 and ISO 2029933)the United States Department of Agriculture–National Institute of Food and Agriculture,the United States(Hatch Project No.1014919,Grant Nos.2018-70005-28792,2019-67013-29171,and 2020-67021-32460)+3 种基金the Washington Grain Commission,the United States(Endowment and Grant Nos.126593 and 134574)Sichuan Science and Technology Program,China(Grant Nos.2021YJ0269 and 2021YJ0266)the Program of Chinese National Beef Cattle and Yak Industrial Technology System,China(Grant No.CARS-37)Fundamental Research Funds for the Central Universities,China(Southwest Minzu University,Grant No.2020NQN26)。
文摘Genome-wide association study(GWAS)and genomic prediction/selection(GP/GS)are the two essential enterprises in genomic research.Due to the great magnitude and complexity of genomic and phenotypic data,analytical methods and their associated software packages are frequently advanced.GAPIT is a widely-used genomic association and prediction integrated tool as an R package.The first version was released to the public in 2012 with the implementation of the general linear model(GLM),mixed linear model(MLM),compressed MLM(CMLM),and genomic best linear unbiased prediction(g BLUP).The second version was released in 2016 with several new implementations,including enriched CMLM(ECMLM)and settlement of MLMs under progressively exclusive relationship(SUPER).All the GWAS methods are based on the single-locus test.For the first time,in the current release of GAPIT,version 3 implemented three multi-locus test methods,including multiple loci mixed model(MLMM),fixed and random model circulating probability unification(Farm CPU),and Bayesian-information and linkage-disequilibrium iteratively nested keyway(BLINK).Additionally,two GP/GS methods were implemented based on CMLM(named compressed BLUP;c BLUP)and SUPER(named SUPER BLUP;s BLUP).These new implementations not only boost statistical power for GWAS and prediction accuracy for GP/GS,but also improve computing speed and increase the capacity to analyze big genomic data.Here,we document the current upgrade of GAPIT by describing the selection of the recently developed methods,their implementations,and potential impact.All documents,including source code,user manual,demo data,and tutorials,are freely available at the GAPIT website(http://zzlab.net/GAPIT).
文摘全基因组SNP变异检测是开展基因组育种(Genomic selection)和准确度量群体遗传多样性的基础。继国外开发出60 K和600 K鸡SNP芯片后,中国农业科学院北京畜牧兽医研究所等单位,针对国产化鸡育种和地方种质资源保护的现状和需求,自主研发出了"京芯一号"55 K SNP芯片等高性价比的检测芯片。芯片特点包括:(1)包含中国地方鸡种特有遗传变异信息,兼顾国外商业化鸡种基因组信息;(2)整合大量的功能基因相关SNP位点;(3)在基因组上均匀分布;(4)密度适中,性价比高等。应用实践证明,鸡基因组SNP芯片在基因组选择育种、种质资源多样性分析、亲缘关系鉴定、基因组关联研究、基因定位等方面可发挥重要作用。文章以"京芯一号"55 K SNP芯片为重点,对鸡全基因组SNP芯片研发和应用的最新进展进行了综述。
基金National Key Research and Development Program of China(2016YFD0101803)Central Public-interest Scientific Institution Basal Research Fund(Y2020PT20)+5 种基金Agricultural Science and Technology Innovation Program of the Chinese Academy of Agricultural Sciences(CAAS-XTCX2016009)Shijiazhuang Science and Technology Incubation Program(191540089A)Hebei Innovation Capability Enhancement Project(19962911D)Project of Hainan Yazhou Bay Seed Laboratory(B21HJ0223)Department of Science and Technology of Ninxia Project(NXNYYZ202001)Research activities at CIMMYT were supported by the Bill and Melinda Gates Foundation and the CGIAR Research Program MAIZE.
文摘The first paradigm of plant breeding involves direct selection-based phenotypic observation,followed by predictive breeding using statistical models for quantitative traits constructed based on genetic experimental design and,more recently,by incorporation of molecular marker genotypes.However,plant performance or phenotype(P)is determined by the combined effects of genotype(G),envirotype(E),and genotype by environment interaction(GEI).Phenotypes can be predicted more precisely by training a model using data collected from multiple sources,including spatiotemporal omics(genomics,phenomics,and enviromics across time and space).Integration of 3D information profiles(G-P-E),each with multidimensionality,provides predictive breeding with both tremendous opportunities and great challenges.Here,we first review innovative technologies for predictive breeding.We then evaluate multidimensional information profiles that can be integrated with a predictive breeding strategy,particularly envirotypic data,which have largely been neglected in data collection and are nearly untouched in model construction.We propose a smart breeding scheme,integrated genomic-enviromic prediction(iGEP),as an extension of genomic prediction,using integrated multiomics information,big data technology,and artificial intelligence(mainly focused on machine and deep learning).We discuss how to implement iGEP,including spatiotemporal models,environmental indices,factorial and spatiotemporal structure of plant breeding data,and cross-species prediction.A strategy is then proposed for prediction-based crop redesign at both the macro(individual,population,and species)and micro(gene,metabolism,and network)scales.Finally,we provide perspectives on translating smart breeding into genetic gain through integrative breeding platforms and open-source breeding initiatives.We call for coordinated efforts in smart breeding through iGEP,institutional partnerships,and innovative technological support.
基金National Key R&D Program of China(2021YFD1201200)National Science Foundation of China(32022064)+1 种基金Project of Hainan Yazhou Bay Seed Lab(B21HJ0223)Innovation Program of the Chinese Academy of Agricultural Sciences.
文摘Genomic prediction is an effective way to accelerate the rate of agronomic trait improvement in plants.Traditional methods typically use linear regression models with clear assumptions;such methods are unable to capture the complex relationships between genotypes and phenotypes.Non-linear models(e.g.,deep neural networks)have been proposed as a superior alternative to linear models because they can capture complex non-additive effects.Here we introduce a deep learning(DL)method,deep neural network genomic prediction(DNNGP),for integration of multi-omics data in plants.We trained DNNGP on four datasets and compared its performance with methods built with five classic models:genomic best linear unbiased prediction(GBLUP);two methods based on a machine learning(ML)framework,light gradient boosting machine(LightGBM)and support vector regression(SVR);and two methods based on a DL framework,deep learning genomic selection(DeepGS)and deep learning genome-wide association study(DLGWAS).DNNGP is novel in five ways.First,it can be applied to a variety of omics data to predict phenotypes.Second,the multilayered hierarchical structure of DNNGP dynamically learns features from raw data,avoiding overfitting and improving the convergence rate using a batch normalization layer and early stopping and rectified linear activation(rectified linear unit)functions.Third,when small datasets were used,DNNGP produced results that are competitive with results from the other five methods,showing greater prediction accuracy than the other methods when large-scale breeding data were used.Fourth,the computation time required by DNNGP was comparable with that of commonly used methods,up to 10 times faster than DeepGS.Fifth,hyperparameters can easily be batch tuned on a local machine.Compared with GBLUP,LightGBM,SVR,DeepGS and DLGWAS,DNNGP is superior to these existing widely used genomic selection(GS)methods.Moreover,DNNGP can generate robust assessments from diverse datasets,including omics data,and quickly incorporate complex and large datas
基金supported by grants from the National High Technology Research and Development Program of China(2014AA10A601-5)the National Key Research and Development Program of China(2016YFD0100303)+5 种基金the National Natural Science Foundation of China(91535103)the Natural Science Foundations of Jiangsu Province(BK20150010)the Natural Science Foundation of the Jiangsu Higher Education Institutions(14KJA210005)the Open Research Fund of State Key Laboratory of Hybrid Rice(Wuhan University)(KF201701)the Science and Technology Innovation Fund Project in Yangzhou University(2016CXJ021)the Priority Academic Program Development of Jiangsu Higher Education Institutions and the Innovative Research Team of Universities in Jiangsu Province
文摘With marker and phenotype information from observed populations, genomic selection (GS) can be used to establish associations between markers and phenotypes. It aims to use genome-wide markers to estimate the effects of all loci and thereby predict the genetic values of untested populations, so as to achieve more comprehensive and reliable selection and to accelerate genetic progress in crop breeding. GS models usually face the problem that the number of markers is much higher than the number of phenotypic observations. To overcome this issue and improve prediction accuracy, many models and algorithms, including GBLUP, Bayes, and machine learning have been employed for GS. As hot issues in GS research, the estimation of non-additive genetic effects and the combined analysis of multiple traits or multiple environments are also important for improving the accuracy of prediction. In recent years, crop breeding has taken advantage of the development of GS. The principles and characteristics of current popular GS methods and research progress in hese methods for crop improvement are reviewed in this paper.
基金Participation of Jos Luis Araus and María Dolors Serret was supported by the Spanish Project AGL2010-20180 (subprogram AGR)the FP7 European Project OPTICHINA (266045)
文摘Genomic selection (GS) and high-throughput phenotyping have recently been captivating the interest of the crop breeding com- munity from both the public and private sectors world-wide. Both approaches promise to revolutionize the prediction of complex traits, including growth, yield and adaptation to stress. Whereas high-throughput phenotyping may help to improve understanding of crop physiology, most powerful techniques for high-throughput field phenotyping are empirical rather than analytical and compa- rable to genomic selection. Despite the fact that the two method- ological approaches represent the extremes of what is understood as the breeding process (phenotype versus genome), they both consider the targeted traits (e.g. grain yield, growth, phenology, plant adaptation to stress) as a black box instead of dissectingthem as a set of secondary traits (i.e. physiological) putatively related to the target trait. Both GS and high-throughput phenotyping have in common their empirical approach enabling breeders to use genome profile or phenotype without understanding the underlying biology. This short review discusses the main aspects of both approaches and focuses on the case of genomic selection of maize flowering traits and near-infrared spectroscopy (NIRS) and plant spectral reflectance as high-throughput field phenotyping methods for complex traits such as crop growth and yield.