Proteins function as integral actors in essential life processes,rendering the realm of protein research a fundamental domain that possesses the potential to propel advancements in pharmaceuticals and disease investig...Proteins function as integral actors in essential life processes,rendering the realm of protein research a fundamental domain that possesses the potential to propel advancements in pharmaceuticals and disease investigation.Within the context of protein research,an imperious demand arises to uncover protein functionalities and untangle intricate mechanistic underpinnings.Due to the exorbitant costs and limited throughput inherent in experimental investigations,computational models offer a promising alternative to accelerate protein function annotation.In recent years,protein pre-training models have exhibited noteworthy advancement across multiple prediction tasks.This advancement highlights a notable prospect for effectively tackling the intricate downstream task associated with protein function prediction.In this review,we elucidate the historical evolution and research paradigms of computational methods for predicting protein function.Subsequently,we summarize the progress in protein and molecule representation as well as feature extraction techniques.Furthermore,we assess the performance of machine learning-based algorithms across various objectives in protein function prediction,thereby offering a comprehensive perspective on the progress within this field.展开更多
GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined ...GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interac-tion data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automati-cally selects the most appropriate functional classes as specific as possible during the learning proc-ess, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organ-ized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006.展开更多
Compound-protein interactions(CPIs)are critical in drug discovery for identifying therapeutic targets,drug side effects,and repurposing existing drugs.Machine learning(ML)algorithms have emerged as powerful tools for ...Compound-protein interactions(CPIs)are critical in drug discovery for identifying therapeutic targets,drug side effects,and repurposing existing drugs.Machine learning(ML)algorithms have emerged as powerful tools for CPI prediction,offering notable advantages in cost-effectiveness and efficiency.This review provides an overview of recent advances in both structure-based and non-structure-based CPI prediction ML models,highlighting their performance and achievements.It also offers insights into CPI prediction-related datasets and evaluation benchmarks.Lastly,the article presents a comprehensive assessment of the current landscape of CPI prediction,elucidating the challenges faced and outlining emerging trends to advance the field.展开更多
Protein-protein interactions play key roles in cells. Lots of experimental approaches and in silico methods have been developed to identify and predict large-scale pro- tein-protein interactions. However, compared wit...Protein-protein interactions play key roles in cells. Lots of experimental approaches and in silico methods have been developed to identify and predict large-scale pro- tein-protein interactions. However, compared with the tradi- tionally experimental results, the high-throughput pro- tein-protein interaction data often contain the false positives in high probability. In order to fully utilize the large-scale data, it is necessary to develop bioinformatic methods for systematically evaluating those data in order to further im- prove the data reliability and mine biological information. This review summarizes the methodologies of analysis and application of high-throughput protein-protein interaction data, including the evaluation methods, the relationship be- tween protein-protein interaction data and other protein biological information, and their applications in biological study. In addition, this paper also suggests some interesting topics on mining high-throughput protein-protein interaction data.展开更多
基金supported in part by the National Natural Science Foundation of China(22033001)the National Key R&D Program of China(2022YFA1303700)the Chinese Academy of Medical Sciences(2021-I2M-5-014).
文摘Proteins function as integral actors in essential life processes,rendering the realm of protein research a fundamental domain that possesses the potential to propel advancements in pharmaceuticals and disease investigation.Within the context of protein research,an imperious demand arises to uncover protein functionalities and untangle intricate mechanistic underpinnings.Due to the exorbitant costs and limited throughput inherent in experimental investigations,computational models offer a promising alternative to accelerate protein function annotation.In recent years,protein pre-training models have exhibited noteworthy advancement across multiple prediction tasks.This advancement highlights a notable prospect for effectively tackling the intricate downstream task associated with protein function prediction.In this review,we elucidate the historical evolution and research paradigms of computational methods for predicting protein function.Subsequently,we summarize the progress in protein and molecule representation as well as feature extraction techniques.Furthermore,we assess the performance of machine learning-based algorithms across various objectives in protein function prediction,thereby offering a comprehensive perspective on the progress within this field.
基金the National Natural Science Foundation of China (Grant Nos. 30170515, 30370388, 30370798, 30570424 and 30571034),the National High Tech Development Project of China (Grant Nos. 2003AA2Z2051 and 2002AA2Z2052),+3 种基金Heilongjiang Science & Technology Key Project (Grant No. GB03C602-4),Harbin (City) Science & Technology Key Project (Grant No. 2003AA3CS113),Natural Science Foundation of Heilongjiang (Grant No. F0177 ),Outstanding Overseas Scientist Foundation of Education Department of Heilongjiang Province (Grant No. 1055HG009)
文摘GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interac-tion data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automati-cally selects the most appropriate functional classes as specific as possible during the learning proc-ess, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organ-ized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006.
基金supported by National Natural Science Foundation of China(T2225002,82273855 to M.Y.Z.,82204278 to X.T.L.)Lingang Laboratory(LG202102-01-02 to M.Y.Z.)+2 种基金National Key Research and Development Programof China(2022YFC3400504 toM.Y.Z.)SIMM-SHUTCM Traditional Chinese Medicine Innovation Joint Research Program(E2G805H to M.Y.Z.)Shanghai Municipal Science and TechnologyMajor Project and China Postdoctoral Science Foundation(2022M720153 to X.T.L.).
文摘Compound-protein interactions(CPIs)are critical in drug discovery for identifying therapeutic targets,drug side effects,and repurposing existing drugs.Machine learning(ML)algorithms have emerged as powerful tools for CPI prediction,offering notable advantages in cost-effectiveness and efficiency.This review provides an overview of recent advances in both structure-based and non-structure-based CPI prediction ML models,highlighting their performance and achievements.It also offers insights into CPI prediction-related datasets and evaluation benchmarks.Lastly,the article presents a comprehensive assessment of the current landscape of CPI prediction,elucidating the challenges faced and outlining emerging trends to advance the field.
基金This work was supported by the 863 Hi-Tech Program(Grant Nos.2001AA231011,2002AA231051,2003AA23101 l&2004BA711A21)National 973 Key Basic Research Program(Grant Nos.2002CB713807,2003CB715901&2004CB518606)the Na-tional Natural Science Foundation of China(Grant No.904080 10).
文摘Protein-protein interactions play key roles in cells. Lots of experimental approaches and in silico methods have been developed to identify and predict large-scale pro- tein-protein interactions. However, compared with the tradi- tionally experimental results, the high-throughput pro- tein-protein interaction data often contain the false positives in high probability. In order to fully utilize the large-scale data, it is necessary to develop bioinformatic methods for systematically evaluating those data in order to further im- prove the data reliability and mine biological information. This review summarizes the methodologies of analysis and application of high-throughput protein-protein interaction data, including the evaluation methods, the relationship be- tween protein-protein interaction data and other protein biological information, and their applications in biological study. In addition, this paper also suggests some interesting topics on mining high-throughput protein-protein interaction data.