Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi, 710048, China.
Department of Information Engineering, College of Technology, Hubei Engineering University, Xiaogan, Hubei, 432000, China.
BMC Genomics. 2024 Aug 2;25(1):756. doi: 10.1186/s12864-024-10662-y.
Long non-coding RNAs (lncRNAs) are RNA transcripts of more than 200 nucleotides that do not encode canonical proteins. Their biological structure is similar to messenger RNAs (mRNAs). To distinguish between lncRNA and mRNA transcripts quickly and accurately, we upgraded the PLEK alignment-free tool to its next version, PLEKv2, and constructed models tailored for both animals and plants.
PLEKv2 can achieve 98.7% prediction accuracy for human datasets. Compared with classical tools and deep learning-based models, this is 8.1%, 3.7%, 16.6%, 1.4%, 4.9%, and 48.9% higher than CPC2, CNCI, Wen et al.'s CNN, LncADeep, PLEK, and NcResNet, respectively. The accuracy of PLEKv2 was > 90% for cross-species prediction. PLEKv2 is more effective and robust than CPC2, CNCI, LncADeep, PLEK, and NcResNet for primate datasets (including chimpanzees, macaques, and gorillas). Moreover, PLEKv2 is not only suitable for non-human primates that are closely related to humans, but can also predict the coding ability of RNA sequences in plants such as Arabidopsis.
The experimental results illustrate that the model constructed by PLEKv2 can distinguish lncRNAs and mRNAs better than PLEK. The PLEKv2 software is freely available at https://sourceforge.net/projects/plek2/ .
长非编码 RNA(lncRNA)是长度超过 200 个核苷酸的 RNA 转录物,不编码规范蛋白质。它们的生物结构与信使 RNA(mRNA)相似。为了快速准确地区分 lncRNA 和 mRNA 转录物,我们将 PLEK 无比对工具升级到其下一个版本 PLEKv2,并为动物和植物构建了定制模型。
PLEKv2 可以实现 98.7%的人类数据集预测准确率。与经典工具和基于深度学习的模型相比,这分别高出 8.1%、3.7%、16.6%、1.4%、4.9%和 48.9%。与 CPC2、CNCI、Wen 等人的 CNN、LncADeep、PLEK 和 NcResNet 相比,PLEKv2 的跨物种预测准确率>90%。PLEKv2 比 CPC2、CNCI、LncADeep、PLEK 和 NcResNet 对灵长类数据集(包括黑猩猩、猕猴和大猩猩)更有效和稳健。此外,PLEKv2 不仅适用于与人类密切相关的非灵长类动物,还可以预测拟南芥等植物中 RNA 序列的编码能力。
实验结果表明,PLEKv2 构建的模型比 PLEK 能更好地区分 lncRNA 和 mRNA。PLEKv2 软件可在 https://sourceforge.net/projects/plek2/ 免费获取。