Functional characterization of every single protein is a major challenge of the post-genomic era. The large-scale analysis of a cell's proteins, proteomics, seeks to provide these proteins with reliable annotation...Functional characterization of every single protein is a major challenge of the post-genomic era. The large-scale analysis of a cell's proteins, proteomics, seeks to provide these proteins with reliable annotations regarding their interaction partners and functions in the cellular machinery. An important step on this way is to determine the subcellular localization of each protein. Eukaryotic cells are divided into subcellular compartments, or organelles. Transport across the membrane into the organelles is a highly regulated and complex cellular process. Predicting the subcellular localization by computational means has been an area of vivid activity during recent years. The publicly available prediction methods differ mainly in four aspects: the underlying biological motivation, the computational method used, localization coverage, and reliability, which are of importance to the user. This review provides a short description of the main events in the protein sorting process and an overview of the most commonly used methods in this field.展开更多
The ability to predict the subcellular localization of a protein from its sequence is of great importance, as it provides information about the protein's function. We present a computational tool, PredSL, which utili...The ability to predict the subcellular localization of a protein from its sequence is of great importance, as it provides information about the protein's function. We present a computational tool, PredSL, which utilizes neural networks, Markov chains, profile hidden Markov models, and scoring matrices for the prediction of the subcellular localization of proteins in eukaryotic cells from the N-terminal amino acid sequence. It aims to classify proteins into five groups: chloroplast, thylakoid, mitochondrion, secretory pathway, and "other". When tested in a fivefold cross-validation procedure, PredSL demonstrates 86.7% and 87.1% overall accuracy for the plant and non-plant datasets, respectively. Compared with TargetP, which is the most widely used method to date, and LumenP, the results of PredSL are comparable in most cases. When tested on the experimentally verified proteins of the Saccharomyces cerevisiae genome, PredSL performs comparably if not better than any available algorithm for the same task. Furthermore, PredSL is the only method capable for the prediction of these subcellular localizations that is available as a stand-alone application through the URL: http://bioinformatics.biol.uoa.gr/PredSL/.展开更多
Understanding the subcellular localization of long non-coding RNAs(IncRNAs)is crucial for unraveling their functional mechanisms.While previous computational methods have made progress in predicting IncRNA subcellular...Understanding the subcellular localization of long non-coding RNAs(IncRNAs)is crucial for unraveling their functional mechanisms.While previous computational methods have made progress in predicting IncRNA subcellular localization,most of them ignore the sequence order information by relying on k-mer frequency features to encode IncRNA sequences.In the study,we develope SGCL-LncLoc,a novel interpretable deep learning model based on supervised graph contrastive learning.SGCL-LncLoc transforms IncRNA sequences into de Bruijn graphs and uses the Word2Vec technique to learn the node representation of the graph.Then,SGCL-LncLoc applies graph convolutional networks to learn the comprehensive graph representation.Additionally,we propose a computational method to map the attention weights of the graph nodes to the weights of nucleotides in the IncRNA sequence,allowing SGCL-LncLoc to serve as an interpretable deep learning model.Furthermore,SGCL-LncLoc employs a supervised contrastive learning strategy,which leverages the relationships between different samples and label information,guiding the model to enhance representation learning for IncRNAs.Extensive experimental results demonstrate that SGCL-LncLoc outperforms both deep learning baseline models and existing predictors,showing its capability for accurate IncRNA subcellular localization prediction.Furthermore,we conduct a motif analysis,revealing that SGCL-LncLoc successfully captures known motifs associated with IncRNA subcellular localization.The SGCL-LncLoc web server is available at http://csuligroup.com:8000/SGCL-LncLoc.The source code can be obtained from https://github.com/CSUBioGroup/SGCL-LncLoc.展开更多
文摘Functional characterization of every single protein is a major challenge of the post-genomic era. The large-scale analysis of a cell's proteins, proteomics, seeks to provide these proteins with reliable annotations regarding their interaction partners and functions in the cellular machinery. An important step on this way is to determine the subcellular localization of each protein. Eukaryotic cells are divided into subcellular compartments, or organelles. Transport across the membrane into the organelles is a highly regulated and complex cellular process. Predicting the subcellular localization by computational means has been an area of vivid activity during recent years. The publicly available prediction methods differ mainly in four aspects: the underlying biological motivation, the computational method used, localization coverage, and reliability, which are of importance to the user. This review provides a short description of the main events in the protein sorting process and an overview of the most commonly used methods in this field.
文摘The ability to predict the subcellular localization of a protein from its sequence is of great importance, as it provides information about the protein's function. We present a computational tool, PredSL, which utilizes neural networks, Markov chains, profile hidden Markov models, and scoring matrices for the prediction of the subcellular localization of proteins in eukaryotic cells from the N-terminal amino acid sequence. It aims to classify proteins into five groups: chloroplast, thylakoid, mitochondrion, secretory pathway, and "other". When tested in a fivefold cross-validation procedure, PredSL demonstrates 86.7% and 87.1% overall accuracy for the plant and non-plant datasets, respectively. Compared with TargetP, which is the most widely used method to date, and LumenP, the results of PredSL are comparable in most cases. When tested on the experimentally verified proteins of the Saccharomyces cerevisiae genome, PredSL performs comparably if not better than any available algorithm for the same task. Furthermore, PredSL is the only method capable for the prediction of these subcellular localizations that is available as a stand-alone application through the URL: http://bioinformatics.biol.uoa.gr/PredSL/.
基金supported by the National Natural Science Foundation of China(No.62102457)the Hunan Provincial Natural Science Foundation of China(No.2023JJ40763)+1 种基金the Hunan Provincial Science and Technology Program(No.2021RC4008)the Fundamental Research Funds for the Central Universities of Central South University(No.CX20230271).
文摘Understanding the subcellular localization of long non-coding RNAs(IncRNAs)is crucial for unraveling their functional mechanisms.While previous computational methods have made progress in predicting IncRNA subcellular localization,most of them ignore the sequence order information by relying on k-mer frequency features to encode IncRNA sequences.In the study,we develope SGCL-LncLoc,a novel interpretable deep learning model based on supervised graph contrastive learning.SGCL-LncLoc transforms IncRNA sequences into de Bruijn graphs and uses the Word2Vec technique to learn the node representation of the graph.Then,SGCL-LncLoc applies graph convolutional networks to learn the comprehensive graph representation.Additionally,we propose a computational method to map the attention weights of the graph nodes to the weights of nucleotides in the IncRNA sequence,allowing SGCL-LncLoc to serve as an interpretable deep learning model.Furthermore,SGCL-LncLoc employs a supervised contrastive learning strategy,which leverages the relationships between different samples and label information,guiding the model to enhance representation learning for IncRNAs.Extensive experimental results demonstrate that SGCL-LncLoc outperforms both deep learning baseline models and existing predictors,showing its capability for accurate IncRNA subcellular localization prediction.Furthermore,we conduct a motif analysis,revealing that SGCL-LncLoc successfully captures known motifs associated with IncRNA subcellular localization.The SGCL-LncLoc web server is available at http://csuligroup.com:8000/SGCL-LncLoc.The source code can be obtained from https://github.com/CSUBioGroup/SGCL-LncLoc.