摘要
高通量测序技术的出现带来了大量可用的转录组数据,评估进化保守区域的编码潜力成为转录数据分析中的核心任务。对转录本编码潜力的预测可以用来鉴定长非编码RNA(long noncoding RNA,lncRNA)。lncRNA是一种长度超过200个核苷酸的非编码RNA,研究表明lncRNA在多种生物中都有重要作用,能够在染色质修饰、表观遗传、转录及转录后调控等多种层面发挥重要的调控作用。已经有许多基于机器学习的工具被开发用来区分编码与非编码转录本序列。不同的工具通常是针对不同的情况设计的,因此需要根据特定的情况选择合适的方法。本文分析了几种常用工具各自的特点和适用范围,帮助研究人员选用合适的方法以获得更可靠的结果。
With the advent of high-throughput sequencing technologies,a large amount of available transcriptome data has been generated,and the evaluation of the coding potential of evolutionarily conserved regions has become a core in the analysis of transcripts.Prediction of the coding potential of transcripts can be used to identify long noncoding RNAs(lncRNAs).lncRNA is a kind of noncoding RNA with length more than 200 nucleotides,which plays an important role in many organisms.It can play an important regulatory role in various aspects such as chromatin modification,epigenetics,transcription and post-transcriptional regulation.Many machine learning tools have been developed to distinguish between coding and non-coding transcripts.Different tools are designed for different situations,so it is required to choose the suitable method for the specific situation.In this review,several popular tools and their advantages,disadvantages,and application scopes are summarised to assist people in employing a suitable method and obtaining a more reliable result.
作者
杨阳
YANG Yang(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)
出处
《智能计算机与应用》
2020年第3期375-378,共4页
Intelligent Computer and Applications