College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
School of Computer Science, Wuhan University, Wuhan 430072, China.
Genes (Basel). 2019 Sep 3;10(9):672. doi: 10.3390/genes10090672.
Long non-coding RNAs (lncRNAs) are a class of RNAs with the length exceeding 200 base pairs (bps), which do not encode proteins, nevertheless, lncRNAs have many vital biological functions. A large number of novel transcripts were discovered as a result of the development of high-throughput sequencing technology. Under this circumstance, computational methods for lncRNA prediction are in great demand. In this paper, we consider global sequence features and propose a stacked ensemble learning-based method to predict lncRNAs from transcripts, abbreviated as PredLnc-GFStack. We extract the critical features from the candidate feature list using the genetic algorithm (GA) and then employ the stacked ensemble learning method to construct PredLnc-GFStack model. Computational experimental results show that PredLnc-GFStack outperforms several state-of-the-art methods for lncRNA prediction. Furthermore, PredLnc-GFStack demonstrates an outstanding ability for cross-species ncRNA prediction.
长链非编码 RNA(lncRNA)是一类长度超过 200 个碱基对(bp)的 RNA,不编码蛋白质,但 lncRNA 具有许多重要的生物学功能。高通量测序技术的发展发现了大量新的转录本。在这种情况下,对 lncRNA 预测的计算方法的需求很大。在本文中,我们考虑全局序列特征,并提出了一种基于堆叠集成学习的方法从转录本中预测 lncRNA,简称为 PredLnc-GFStack。我们使用遗传算法(GA)从候选特征列表中提取关键特征,然后使用堆叠集成学习方法构建 PredLnc-GFStack 模型。计算实验结果表明,PredLnc-GFStack 在 lncRNA 预测方面优于几种先进的方法。此外,PredLnc-GFStack 还展示了出色的跨物种 ncRNA 预测能力。