Brief Funct Genomics. 2021 Jun 9;20(3):162-173. doi: 10.1093/bfgp/elab016.
Accurately and rapidly distinguishing long noncoding RNAs (lncRNAs) from transcripts is prerequisite for exploring their biological functions. In recent years, many computational methods have been developed to predict lncRNAs from transcripts, but there is no systematic review on these computational methods. In this review, we introduce databases and features involved in the development of computational prediction models, and subsequently summarize existing state-of-the-art computational methods, including methods based on binary classifiers, deep learning and ensemble learning. However, a user-friendly way of employing existing state-of-the-art computational methods is in demand. Therefore, we develop a Python package ezLncPred, which provides a pragmatic command line implementation to utilize nine state-of-the-art lncRNA prediction methods. Finally, we discuss challenges of lncRNA prediction and future directions.
准确快速地区分长非编码 RNA(lncRNAs)和转录本是探索其生物学功能的前提。近年来,已经开发了许多计算方法来从转录本中预测 lncRNAs,但尚未对这些计算方法进行系统的综述。在本综述中,我们介绍了数据库和特征,这些数据库和特征涉及计算预测模型的开发,随后总结了现有的最先进的计算方法,包括基于二分类器、深度学习和集成学习的方法。然而,人们需要一种更方便使用现有最先进的计算方法的方式。因此,我们开发了一个 Python 包 ezLncPred,它提供了一种实用的命令行实现,可利用九种最先进的 lncRNA 预测方法。最后,我们讨论了 lncRNA 预测的挑战和未来方向。