Protein tertiary structure is indispensible in revealing the biological functions of proteins. De novo perdition of protein tertiary structure is dependent on protein fold recognition. This study proposes a novel meth...Protein tertiary structure is indispensible in revealing the biological functions of proteins. De novo perdition of protein tertiary structure is dependent on protein fold recognition. This study proposes a novel method for prediction of protein fold types which takes primary sequence as input. The proposed method, PFP-RFSM, employs a random forest classifier and a comprehensive feature representation, including both sequence and predicted structure descriptors. Particularly, we propose a method for generation of features based on sequence motifs and those features are firstly employed in protein fold prediction. PFP-RFSM and ten representative protein fold predictors are validated in a benchmark dataset consisting of 27 fold types. Experiments demonstrate that PFP-RFSM outperforms all existing protein fold predictors and improves the success rates by 2%-14%. The results suggest sequence motifs are effective in classification and analysis of protein sequences.展开更多
Although the protein sequence-structure gap continues to enlarge due to the development of high-throughput sequencing tools,the protein structure universe tends to be complete without proteins with novel structural fo...Although the protein sequence-structure gap continues to enlarge due to the development of high-throughput sequencing tools,the protein structure universe tends to be complete without proteins with novel structural folds deposited in the protein data bank (PDB)recently.In this work,we identify a protein structural dictionary (Frag-K)composed of a set of backbone fragments ranging from 4 to 20 residues as the structural "keywords"that can effectively distinguish between major protein folds.We firstly apply randomized spectral clustering and random forest algorithms to construct representative and sensitive protein fragment libraries from a large scale of high-quality,non-homologous protein structures available in PDB.We analyze the impacts of clustering cut-offs on the performance of the fragment hbraries.Then,the Frag-K fragments are employed as structural features to classify protein structures in major protein folds defined by SCOP (Structural Classification of Proteins).Our results show that a structural dictionary with N400 4-to 20-residue Frag-K fragments is capable of classifying major SCOP folds with high accuracy.展开更多
The development of human genome project calls for more rapid and accurate protein structure prediction method to assign the structure and function of gene products. Threading has been proved to be successful in protei...The development of human genome project calls for more rapid and accurate protein structure prediction method to assign the structure and function of gene products. Threading has been proved to be successful in protein fold assignment,although difficulties remain for low homologous sequences. We have developed a method for solving the sequence rearrangement problem in threading. By reshuffling secondary elements,protein structures with the same spatial arrangement of secondary structures but different connections can be predicted. This method has been proved to be useful in fold recognition for proteins of different evolutionary origin and converge to the same fold.展开更多
Ambient light has profound effects on early seedling de-etiolation through red and far-red light-absorbing phytochromes and blue and UV-A light-absorbing cryptochromes. Subsequent integration of various light signal t...Ambient light has profound effects on early seedling de-etiolation through red and far-red light-absorbing phytochromes and blue and UV-A light-absorbing cryptochromes. Subsequent integration of various light signal trans- duction pathways leads to changes in gene expression and morphogenic responses. Here, we report the isolation of a new Arabidopsis light-signaling component, HYPOSENSITIVE TO LIGHT or HTL. Both htl-1 and htl-2 alleles displayed a long hypocotyl phenotype under red, far-red, and blue light, whereas overexpression of HTL caused a short hypocotyl pheno- type under similar light conditions. The mutants also showed other photomorphogenic defects such as elongated petioles, retarded cotyledon and leaf expansion, reduced accumulation of chlorophyll and anthocyanin pigments, and attenuated expression of light-responsive CHLOROPHYLL A/B BINDING PROTEIN 3 and CHALCONE SYNTHASE genes. HTL belongs to an alpha/beta fold protein family and is localized strongly in the nucleus and weakly in the cytosol. The expression of HTL was strongly induced by light of various wavelengths and this light induction was impaired in elongated hypocotyl 5. HY5 directly bound to both a C/G-box and a G-box in the HTL promoter but with a greater affinity toward the C/G-box. HTL, therefore, represents a new signaling step downstream of HY5 in phy- and cry-mediated de-etiolation responses.展开更多
文摘Protein tertiary structure is indispensible in revealing the biological functions of proteins. De novo perdition of protein tertiary structure is dependent on protein fold recognition. This study proposes a novel method for prediction of protein fold types which takes primary sequence as input. The proposed method, PFP-RFSM, employs a random forest classifier and a comprehensive feature representation, including both sequence and predicted structure descriptors. Particularly, we propose a method for generation of features based on sequence motifs and those features are firstly employed in protein fold prediction. PFP-RFSM and ten representative protein fold predictors are validated in a benchmark dataset consisting of 27 fold types. Experiments demonstrate that PFP-RFSM outperforms all existing protein fold predictors and improves the success rates by 2%-14%. The results suggest sequence motifs are effective in classification and analysis of protein sequences.
基金the National Natural Science Foundation of China under Grant Nos.61728211 and 61832019.
文摘Although the protein sequence-structure gap continues to enlarge due to the development of high-throughput sequencing tools,the protein structure universe tends to be complete without proteins with novel structural folds deposited in the protein data bank (PDB)recently.In this work,we identify a protein structural dictionary (Frag-K)composed of a set of backbone fragments ranging from 4 to 20 residues as the structural "keywords"that can effectively distinguish between major protein folds.We firstly apply randomized spectral clustering and random forest algorithms to construct representative and sensitive protein fragment libraries from a large scale of high-quality,non-homologous protein structures available in PDB.We analyze the impacts of clustering cut-offs on the performance of the fragment hbraries.Then,the Frag-K fragments are employed as structural features to classify protein structures in major protein folds defined by SCOP (Structural Classification of Proteins).Our results show that a structural dictionary with N400 4-to 20-residue Frag-K fragments is capable of classifying major SCOP folds with high accuracy.
文摘The development of human genome project calls for more rapid and accurate protein structure prediction method to assign the structure and function of gene products. Threading has been proved to be successful in protein fold assignment,although difficulties remain for low homologous sequences. We have developed a method for solving the sequence rearrangement problem in threading. By reshuffling secondary elements,protein structures with the same spatial arrangement of secondary structures but different connections can be predicted. This method has been proved to be useful in fold recognition for proteins of different evolutionary origin and converge to the same fold.
基金This work was supported by the National Science Foundation,by the Plant Biological Sciences Doctoral Dissertation Fellowship and Summer Fellowship from the University of Minnesota
文摘Ambient light has profound effects on early seedling de-etiolation through red and far-red light-absorbing phytochromes and blue and UV-A light-absorbing cryptochromes. Subsequent integration of various light signal trans- duction pathways leads to changes in gene expression and morphogenic responses. Here, we report the isolation of a new Arabidopsis light-signaling component, HYPOSENSITIVE TO LIGHT or HTL. Both htl-1 and htl-2 alleles displayed a long hypocotyl phenotype under red, far-red, and blue light, whereas overexpression of HTL caused a short hypocotyl pheno- type under similar light conditions. The mutants also showed other photomorphogenic defects such as elongated petioles, retarded cotyledon and leaf expansion, reduced accumulation of chlorophyll and anthocyanin pigments, and attenuated expression of light-responsive CHLOROPHYLL A/B BINDING PROTEIN 3 and CHALCONE SYNTHASE genes. HTL belongs to an alpha/beta fold protein family and is localized strongly in the nucleus and weakly in the cytosol. The expression of HTL was strongly induced by light of various wavelengths and this light induction was impaired in elongated hypocotyl 5. HY5 directly bound to both a C/G-box and a G-box in the HTL promoter but with a greater affinity toward the C/G-box. HTL, therefore, represents a new signaling step downstream of HY5 in phy- and cry-mediated de-etiolation responses.