Klapproth Christopher, Sen Rituparno, Stadler Peter F, Findeiß Sven, Fallmann Jörg
Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany.
Helmholtz Institute for RNA-Based Infection Research (HIRI), Helmholtz-Center for Infection Research (HZI), D-97080 Würzburg, Germany.
Noncoding RNA. 2021 Dec 13;7(4):77. doi: 10.3390/ncrna7040077.
Long non-coding RNAs (lncRNAs) are widely recognized as important regulators of gene expression. Their molecular functions range from miRNA sponging to chromatin-associated mechanisms, leading to effects in disease progression and establishing them as diagnostic and therapeutic targets. Still, only a few representatives of this diverse class of RNAs are well studied, while the vast majority is poorly described beyond the existence of their transcripts. In this review we survey common in silico approaches for lncRNA annotation. We focus on the well-established sets of features used for classification and discuss their specific advantages and weaknesses. While the available tools perform very well for the task of distinguishing coding sequence from other RNAs, we find that current methods are not well suited to distinguish lncRNAs or parts thereof from other non-protein-coding input sequences. We conclude that the distinction of lncRNAs from intronic sequences and untranslated regions of coding mRNAs remains a pressing research gap.
长链非编码RNA(lncRNAs)被广泛认为是基因表达的重要调节因子。它们的分子功能范围从充当微小RNA(miRNA)海绵到与染色质相关的机制,从而对疾病进展产生影响,并使其成为诊断和治疗靶点。尽管如此,这类多样的RNA中只有少数代表得到了充分研究,而绝大多数除了其转录本的存在之外,描述甚少。在本综述中,我们概述了lncRNA注释的常见计算机方法。我们重点关注用于分类的成熟特征集,并讨论它们的具体优缺点。虽然现有工具在区分编码序列与其他RNA的任务中表现出色,但我们发现当前方法不太适合从其他非蛋白质编码输入序列中区分lncRNAs或其部分。我们得出结论,将lncRNAs与内含子序列和编码mRNA的非翻译区区分开来仍然是一个紧迫的研究空白。