Descriptors play a pivotal role in enzyme design for the greener synthesis of biochemicals,as they could characterize enzymes and chemicals from the physicochemical and evolutionary perspective.This study examined the...Descriptors play a pivotal role in enzyme design for the greener synthesis of biochemicals,as they could characterize enzymes and chemicals from the physicochemical and evolutionary perspective.This study examined the effects of various descriptors on the performance of Random Forest model used for enzyme-chemical relationships prediction.We curated activity data of seven specific enzyme families from the literature and developed the pipeline for evaluation the machine learning model performance using 10-fold cross-validation.The influence of protein and chemical descriptors was assessed in three scenarios,which were predicting the activity of unknown relations between known enzymes and known chemicals(new relationship evaluation),predicting the activity of novel enzymes on known chemicals(new enzyme evaluation),and predicting the activity of new chemicals on known enzymes(new chemical evaluation).The results showed that protein descriptors significantly enhanced the classification performance of model on new enzyme evaluation in three out of the seven datasets with the greatest number of enzymes,whereas chemical descriptors appear no effect.A variety of sequence-based and structure-based protein descriptors were constructed,among which the esm-2 descriptor achieved the best results.Using enzyme families as labels showed that descriptors could cluster proteins well,which could explain the contributions of descriptors to the machine learning model.As a counterpart,in the new chemical evaluation,chemical descriptors made significant improvement in four out of the seven datasets,while protein descriptors appear no effect.We attempted to evaluate the generalization ability of the model by correlating the statistics of the datasets with the performance of the models.The results showed that datasets with higher sequence similarity were more likely to get better results in the new enzyme evaluation and datasets with more enzymes were more likely beneficial from the protein descriptor strategy.This work provides guidance展开更多
Britton Chance was a truly inspiring scientist,among the greatest.His long and vigorous career in biochemistry,biophysics,and biological instrumentation began with his now-classic kinetic and spectroscopic studies of ...Britton Chance was a truly inspiring scientist,among the greatest.His long and vigorous career in biochemistry,biophysics,and biological instrumentation began with his now-classic kinetic and spectroscopic studies of the mechanisms of action of individual enzymes:catalases,peroxidases,and dehydrogenases.展开更多
Protein kinase substrate phage (PKS phage) was constructed by fusing the substrate recognition consensus sequence of cAMP-dependent protein kinase (cAPK) with bacteriophage minor coat protein g3p and by dis-playing it...Protein kinase substrate phage (PKS phage) was constructed by fusing the substrate recognition consensus sequence of cAMP-dependent protein kinase (cAPK) with bacteriophage minor coat protein g3p and by dis-playing it on the surface of filamentous bacteriophage fd. Phosphorylation in vitro by cAPK showed a unique labelled band of approximately 60 ku, which was consistent with the molecular weight of the PKS-g3p fusion protein. Some weakly phosphorylated bands for both PKS phage and wild-type phage were also observed. Phage display random 15-mer peptide library phosphorylated by cAPK was selected with ferric (Fe3+ ) chelalion affinity resin. After 4 rounds of screening, phage clones were picked out to determine the displayed peptide sequences by DNA sequencing. The results showed that 5 of 14 sequenced phages displayed the cAPK recognition sequence motif (R)RXS/T. Their in vitro phosphorylation analyses revealed the specific labelled bands corresponding to the positive PKS phages with and without the typical (R)RXS/T sequence motif. It suggested that the new method of using ferric (Fe 3+ ) chelation affinity chromatography to identify the substrate specificity of protein kinase from random peptide library was feasible.展开更多
基金This work is supported by National Key Research and Development Program of China(no.2022YFC2105900).
文摘Descriptors play a pivotal role in enzyme design for the greener synthesis of biochemicals,as they could characterize enzymes and chemicals from the physicochemical and evolutionary perspective.This study examined the effects of various descriptors on the performance of Random Forest model used for enzyme-chemical relationships prediction.We curated activity data of seven specific enzyme families from the literature and developed the pipeline for evaluation the machine learning model performance using 10-fold cross-validation.The influence of protein and chemical descriptors was assessed in three scenarios,which were predicting the activity of unknown relations between known enzymes and known chemicals(new relationship evaluation),predicting the activity of novel enzymes on known chemicals(new enzyme evaluation),and predicting the activity of new chemicals on known enzymes(new chemical evaluation).The results showed that protein descriptors significantly enhanced the classification performance of model on new enzyme evaluation in three out of the seven datasets with the greatest number of enzymes,whereas chemical descriptors appear no effect.A variety of sequence-based and structure-based protein descriptors were constructed,among which the esm-2 descriptor achieved the best results.Using enzyme families as labels showed that descriptors could cluster proteins well,which could explain the contributions of descriptors to the machine learning model.As a counterpart,in the new chemical evaluation,chemical descriptors made significant improvement in four out of the seven datasets,while protein descriptors appear no effect.We attempted to evaluate the generalization ability of the model by correlating the statistics of the datasets with the performance of the models.The results showed that datasets with higher sequence similarity were more likely to get better results in the new enzyme evaluation and datasets with more enzymes were more likely beneficial from the protein descriptor strategy.This work provides guidance
文摘Britton Chance was a truly inspiring scientist,among the greatest.His long and vigorous career in biochemistry,biophysics,and biological instrumentation began with his now-classic kinetic and spectroscopic studies of the mechanisms of action of individual enzymes:catalases,peroxidases,and dehydrogenases.
文摘Protein kinase substrate phage (PKS phage) was constructed by fusing the substrate recognition consensus sequence of cAMP-dependent protein kinase (cAPK) with bacteriophage minor coat protein g3p and by dis-playing it on the surface of filamentous bacteriophage fd. Phosphorylation in vitro by cAPK showed a unique labelled band of approximately 60 ku, which was consistent with the molecular weight of the PKS-g3p fusion protein. Some weakly phosphorylated bands for both PKS phage and wild-type phage were also observed. Phage display random 15-mer peptide library phosphorylated by cAPK was selected with ferric (Fe3+ ) chelalion affinity resin. After 4 rounds of screening, phage clones were picked out to determine the displayed peptide sequences by DNA sequencing. The results showed that 5 of 14 sequenced phages displayed the cAPK recognition sequence motif (R)RXS/T. Their in vitro phosphorylation analyses revealed the specific labelled bands corresponding to the positive PKS phages with and without the typical (R)RXS/T sequence motif. It suggested that the new method of using ferric (Fe 3+ ) chelation affinity chromatography to identify the substrate specificity of protein kinase from random peptide library was feasible.