College of Forestry, Nanjing Forestry University, Nanjing, China.
College of Information Science and Technology, Nanjing Forestry University, Nanjing, China.
PLoS One. 2023 Jun 1;18(6):e0286377. doi: 10.1371/journal.pone.0286377. eCollection 2023.
Long non-coding RNAs (lncRNAs) have been widely studied for their important biological significance. In general, we need to distinguish them from protein coding RNAs (pcRNAs) with similar functions. Based on various strategies, algorithms and tools have been designed and developed to train and validate such classification capabilities. However, many of them lack certain scalability, versatility, and rely heavily on genome annotation. In this paper, we design a convenient and biologically meaningful classification tool "Prelnc2" using multi-scale position and frequency information of wavelet transform spectrum and generalizes the frequency statistics method. Finally, we used the extracted features and auxiliary features together to train the model and verify it with test data. PreLnc2 achieved 93.2% accuracy for animal and plant transcripts, outperforming PreLnc by 2.1% improvement and our method provides an effective alternative to the prediction of lncRNAs.
长链非编码 RNA(lncRNA)因其重要的生物学意义而得到广泛研究。一般来说,我们需要将它们与具有相似功能的蛋白质编码 RNA(pcRNA)区分开来。基于各种策略,已经设计和开发了算法和工具来训练和验证这种分类能力。然而,其中许多算法缺乏一定的可扩展性、通用性,并且严重依赖于基因组注释。在本文中,我们使用小波变换频谱的多尺度位置和频率信息以及广义频率统计方法设计了一种方便且具有生物学意义的分类工具“Prelnc2”。最后,我们使用提取的特征和辅助特征一起训练模型,并使用测试数据进行验证。PreLnc2 对动植物转录本的准确率达到 93.2%,比 PreLnc 提高了 2.1%,我们的方法为 lncRNA 的预测提供了一种有效的替代方法。