摘要
Understanding the subcellular localization of long non-coding RNAs(IncRNAs)is crucial for unraveling their functional mechanisms.While previous computational methods have made progress in predicting IncRNA subcellular localization,most of them ignore the sequence order information by relying on k-mer frequency features to encode IncRNA sequences.In the study,we develope SGCL-LncLoc,a novel interpretable deep learning model based on supervised graph contrastive learning.SGCL-LncLoc transforms IncRNA sequences into de Bruijn graphs and uses the Word2Vec technique to learn the node representation of the graph.Then,SGCL-LncLoc applies graph convolutional networks to learn the comprehensive graph representation.Additionally,we propose a computational method to map the attention weights of the graph nodes to the weights of nucleotides in the IncRNA sequence,allowing SGCL-LncLoc to serve as an interpretable deep learning model.Furthermore,SGCL-LncLoc employs a supervised contrastive learning strategy,which leverages the relationships between different samples and label information,guiding the model to enhance representation learning for IncRNAs.Extensive experimental results demonstrate that SGCL-LncLoc outperforms both deep learning baseline models and existing predictors,showing its capability for accurate IncRNA subcellular localization prediction.Furthermore,we conduct a motif analysis,revealing that SGCL-LncLoc successfully captures known motifs associated with IncRNA subcellular localization.The SGCL-LncLoc web server is available at http://csuligroup.com:8000/SGCL-LncLoc.The source code can be obtained from https://github.com/CSUBioGroup/SGCL-LncLoc.
基金
supported by the National Natural Science Foundation of China(No.62102457)
the Hunan Provincial Natural Science Foundation of China(No.2023JJ40763)
the Hunan Provincial Science and Technology Program(No.2021RC4008)
the Fundamental Research Funds for the Central Universities of Central South University(No.CX20230271).