Gene regulation is central to all aspects of organism growth,and understanding it using large-scale functional datasets can provide a whole view of biological processes controlling complex phenotypic traits in crops.H...Gene regulation is central to all aspects of organism growth,and understanding it using large-scale functional datasets can provide a whole view of biological processes controlling complex phenotypic traits in crops.However,the connection between massive functional datasets and trait-associated gene discovery for crop improvement is still lacking.In this study,we constructed a wheat integrative gene regulatory network(wGRN)by combining an updated genome annotation and diverse complementary functional datasets,including gene expression,sequence motif,transcription factor(TF)binding,chromatin accessibility,and evolutionarily conserved regulation.wGRN contains 7.2 million genome-wide interactions covering 5947 TFs and 127439 target genes,which were further verified using known regulatory relationships,condition-specific expression,gene functional information,and experiments.We used wGRN to assign genome-wide genes to 3891 specific biological pathways and accurately prioritize candidate genes associated with complex phenotypic traits in genome-wide association studies.In addition,wGRN was used to enhance the interpretation of a spike temporal transcriptome dataset to construct high-resolution networks.We further unveiled novel regulators that enhance the power of spike phenotypic trait prediction using machine learning and contribute to the spike phenotypic differences among modern wheat accessions.Finally,we developed an interactive webserver,wGRN(http://wheat.cau.edu.cn/wGRN),for the community to explore gene regulation and discover trait-associated genes.Collectively,this community resource establishes the foundation for using large-scale functional datasets to guide trait-associated gene discovery for crop improvement.展开更多
Recent advances in genomic and post-genomic technologies have provided the opportu- nity to generate a previously unimaginable amount of information. However, biological knowledge is still needed to improve the unders...Recent advances in genomic and post-genomic technologies have provided the opportu- nity to generate a previously unimaginable amount of information. However, biological knowledge is still needed to improve the understanding of complex mechanisms such as plant immune responses. Better knowledge of this process could improve crop production and management. Here, we used holistic analysis to combine our own microarray and RNA-seq data with public genomic data from Arabidopsis and cassava in order to acquire biological knowledge about the relationships between proteins encoded by immunity-related genes (IRGs) and other genes. This approach was based on a kernel method adapted for the construction of gene networks. The obtained results allowed us to propose a list of new IRGs. A putative function in the immunity pathway was predicted for the new IRGs. The analysis of networks revealed that our predicted IRGs are either well documented or recognized in previous co-expression studies. In addition to robust relationships between IRGs, there is evidence suggesting that other cellular processes may be also strongly related to immunity.展开更多
The SQUAMOSA promoter binding protein (SBP)-box genes encode a kind of plant-specific transcription factors (TFs) and play important roles in the regulation of plant development. In this study, a genome-wide chara...The SQUAMOSA promoter binding protein (SBP)-box genes encode a kind of plant-specific transcription factors (TFs) and play important roles in the regulation of plant development. In this study, a genome-wide characterization of this family was conducted in maize (Zea mays). Thirty-one SBP-box genes were identified to be distributed in nine chromosomes and 16 of them were complementary to the mature ZmmiR156 sequences. All the Z. mays SBP (ZmSBP) genes were classified into two clusters with eight subgroups according to the phylogenetic analysis of proteins, which were consistent with the pattern of exon-intron structures. The phylogenetic tree of the ZmSBP, Oryza sativa SBP-like (OsSPL) and Arabidopsis thaliana SBP-like (AtSPL) genes were constructed and all the SBP-box genes were divided into eight groups, which was the same as the classification of ZmSBP genes. The comparision of the expression profiles of all SBP-box genes in these three species indicated that most orthologous genes had similar expression patterns. The results from this study provided a basic understanding of the ZmSBP genes and might facilitate future researches for elucidating the SBP-box genes function in maize.展开更多
It is of great importance to identify new cancer genes from the data of large scale genome screenings of gene mutations in cancers. Considering the alternations of some essential functions are indispensable for oncoge...It is of great importance to identify new cancer genes from the data of large scale genome screenings of gene mutations in cancers. Considering the alternations of some essential functions are indispensable for oncogenesis, we define them as cancer functions and select, as their approximations, a group of detailed functions in GO (Gene Ontology) highly enriched with known cancer genes. To evaluate the efficiency of using cancer functions as features to identify cancer genes, we define, in the screened genes, the known protein kinase cancer genes as gold standard positives and the other kinase genes as gold standard negatives. The results show that cancer associated functions are more efficient in identifying cancer genes than the selection pressure feature. Furthermore, combining cancer functions with the number of non-silent mutations can generate more reliable positive predictions. Finally, with precision 0.42, we suggest a list of 46 kinase genes as candidate cancer genes which are annotated to cancer functions and carry at least 3 non-silent mutations.展开更多
It is standard practice, whenever a researcher finds a new gene, to search databases for genes that have a similar sequence. It is not standard practice, whenever a researcher finds a new gene, to search for genes tha...It is standard practice, whenever a researcher finds a new gene, to search databases for genes that have a similar sequence. It is not standard practice, whenever a researcher finds a new gene, to search for genes that have similar expression (co-expression). Failure to perform co-expression searches has lead to incorrect conclusions about the likely function of new genes, and has lead to wasted laboratory attempts to confirm functions incorrectly predicted. We present here the example of Glia Maturation Factor gamma (GMF-gamma). Despite its name, it has not been shown to participate in glia maturation. It is a gene of unknown function that is similar in sequence to GMF-beta. The sequence homology and chromosomal location led to an unsuccessful search for GMF-gamma mutations in glioma. We examined GMF-gamma expression in 1432 human cDNA libraries. Highest expression occurs in phagocytic, antigen-presenting and other hematopoietic cells. We found GMF-gamma mRNA in almost every tissue examined, with expression in nervous tissue no higher than in any other tissue. Our evidence indicates that GMF-gamma participates in phagocytosis in antigen presenting cells. Searches for genes with similar sequences should be supplemented with searches for genes with similar expression to avoid incorrect predictions.展开更多
基金supported by the National Key Research and Development Program of China(2021YFD1200104)the National Natural Science Foundation of China(31991210)+2 种基金the Strategic International Science and Technology Innovation Collaboration Project(2020YFE0202300)the 2115 Talent Development Program of China Agricultural Universitysupported by High-performance Computing Platform of China Agricultural University.
文摘Gene regulation is central to all aspects of organism growth,and understanding it using large-scale functional datasets can provide a whole view of biological processes controlling complex phenotypic traits in crops.However,the connection between massive functional datasets and trait-associated gene discovery for crop improvement is still lacking.In this study,we constructed a wheat integrative gene regulatory network(wGRN)by combining an updated genome annotation and diverse complementary functional datasets,including gene expression,sequence motif,transcription factor(TF)binding,chromatin accessibility,and evolutionarily conserved regulation.wGRN contains 7.2 million genome-wide interactions covering 5947 TFs and 127439 target genes,which were further verified using known regulatory relationships,condition-specific expression,gene functional information,and experiments.We used wGRN to assign genome-wide genes to 3891 specific biological pathways and accurately prioritize candidate genes associated with complex phenotypic traits in genome-wide association studies.In addition,wGRN was used to enhance the interpretation of a spike temporal transcriptome dataset to construct high-resolution networks.We further unveiled novel regulators that enhance the power of spike phenotypic trait prediction using machine learning and contribute to the spike phenotypic differences among modern wheat accessions.Finally,we developed an interactive webserver,wGRN(http://wheat.cau.edu.cn/wGRN),for the community to explore gene regulation and discover trait-associated genes.Collectively,this community resource establishes the foundation for using large-scale functional datasets to guide trait-associated gene discovery for crop improvement.
基金financially supported by the Direccio'n de Investi-gacio'n Sede Bogota'of the Universidad Nacional de Colombia(Grant No.201010016738)
文摘Recent advances in genomic and post-genomic technologies have provided the opportu- nity to generate a previously unimaginable amount of information. However, biological knowledge is still needed to improve the understanding of complex mechanisms such as plant immune responses. Better knowledge of this process could improve crop production and management. Here, we used holistic analysis to combine our own microarray and RNA-seq data with public genomic data from Arabidopsis and cassava in order to acquire biological knowledge about the relationships between proteins encoded by immunity-related genes (IRGs) and other genes. This approach was based on a kernel method adapted for the construction of gene networks. The obtained results allowed us to propose a list of new IRGs. A putative function in the immunity pathway was predicted for the new IRGs. The analysis of networks revealed that our predicted IRGs are either well documented or recognized in previous co-expression studies. In addition to robust relationships between IRGs, there is evidence suggesting that other cellular processes may be also strongly related to immunity.
基金support by the National Natural Science Foundation of China(31200911,31101576)the China Postdoctoral Science Foundation(20100471197,201104475)the Research Fund for the Doctoral Program of Higher Education of China(20110146120040)
文摘The SQUAMOSA promoter binding protein (SBP)-box genes encode a kind of plant-specific transcription factors (TFs) and play important roles in the regulation of plant development. In this study, a genome-wide characterization of this family was conducted in maize (Zea mays). Thirty-one SBP-box genes were identified to be distributed in nine chromosomes and 16 of them were complementary to the mature ZmmiR156 sequences. All the Z. mays SBP (ZmSBP) genes were classified into two clusters with eight subgroups according to the phylogenetic analysis of proteins, which were consistent with the pattern of exon-intron structures. The phylogenetic tree of the ZmSBP, Oryza sativa SBP-like (OsSPL) and Arabidopsis thaliana SBP-like (AtSPL) genes were constructed and all the SBP-box genes were divided into eight groups, which was the same as the classification of ZmSBP genes. The comparision of the expression profiles of all SBP-box genes in these three species indicated that most orthologous genes had similar expression patterns. The results from this study provided a basic understanding of the ZmSBP genes and might facilitate future researches for elucidating the SBP-box genes function in maize.
基金the National Natural Science Foundation of China (Grant Nos. 30370388, 30670539 and 30770558)
文摘It is of great importance to identify new cancer genes from the data of large scale genome screenings of gene mutations in cancers. Considering the alternations of some essential functions are indispensable for oncogenesis, we define them as cancer functions and select, as their approximations, a group of detailed functions in GO (Gene Ontology) highly enriched with known cancer genes. To evaluate the efficiency of using cancer functions as features to identify cancer genes, we define, in the screened genes, the known protein kinase cancer genes as gold standard positives and the other kinase genes as gold standard negatives. The results show that cancer associated functions are more efficient in identifying cancer genes than the selection pressure feature. Furthermore, combining cancer functions with the number of non-silent mutations can generate more reliable positive predictions. Finally, with precision 0.42, we suggest a list of 46 kinase genes as candidate cancer genes which are annotated to cancer functions and carry at least 3 non-silent mutations.
文摘It is standard practice, whenever a researcher finds a new gene, to search databases for genes that have a similar sequence. It is not standard practice, whenever a researcher finds a new gene, to search for genes that have similar expression (co-expression). Failure to perform co-expression searches has lead to incorrect conclusions about the likely function of new genes, and has lead to wasted laboratory attempts to confirm functions incorrectly predicted. We present here the example of Glia Maturation Factor gamma (GMF-gamma). Despite its name, it has not been shown to participate in glia maturation. It is a gene of unknown function that is similar in sequence to GMF-beta. The sequence homology and chromosomal location led to an unsuccessful search for GMF-gamma mutations in glioma. We examined GMF-gamma expression in 1432 human cDNA libraries. Highest expression occurs in phagocytic, antigen-presenting and other hematopoietic cells. We found GMF-gamma mRNA in almost every tissue examined, with expression in nervous tissue no higher than in any other tissue. Our evidence indicates that GMF-gamma participates in phagocytosis in antigen presenting cells. Searches for genes with similar sequences should be supplemented with searches for genes with similar expression to avoid incorrect predictions.