Suppr超能文献

植物长链非编码RNA增强工具(PlantLncBoost):用于植物长链非编码RNA识别的关键特征以及在准确性和通用性方面的显著提升。

PlantLncBoost: key features for plant lncRNA identification and significant improvement in accuracy and generalization.

作者信息

Tian Xue-Chan, Nie Shuai, Domingues Douglas, Rossi Paschoal Alexandre, Jiang Li-Bo, Mao Jian-Feng

机构信息

School of Life Sciences and Medicine, Shandong University of Technology, Zibo, Shandong, 255000, China.

State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, China.

出版信息

New Phytol. 2025 Aug;247(3):1538-1549. doi: 10.1111/nph.70211. Epub 2025 May 27.

Abstract

Long noncoding RNAs (lncRNAs) are critical regulators of numerous biological processes in plants. Nevertheless, their identification is challenging due to the low sequence conservation across various species. Existing computational methods for lncRNA identification often face difficulties in generalizing across diverse plant species, highlighting the need for more robust and versatile identification models. Here, we present PlantLncBoost, a novel computational tool designed to improve the generalization in plant lncRNA identification. By integrating advanced gradient boosting algorithms with comprehensive feature selection, our approach achieves both high accuracy and generalizability. We conducted an extensive analysis of 1662 features and identified three key features - ORF coverage, complex Fourier average, and atomic Fourier amplitude - that effectively distinguish lncRNAs from mRNAs. We assessed the performance of PlantLncBoost using comprehensive datasets from 20 plant species. The model exhibited exceptional performance, with an accuracy of 96.63%, a sensitivity of 98.42%, and a specificity of 94.93%, significantly outperforming existing tools. Further analysis revealed that the features we selected effectively capture the differences between lncRNAs and mRNAs across a variety of plant species. PlantLncBoost represents a significant advancement in plant lncRNA identification. It is freely accessible on GitHub (https://github.com/xuechantian/PlantLncBoost) and has been integrated into a comprehensive analysis pipeline, Plant-LncRNA-pipeline v.2 (https://github.com/xuechantian/Plant-LncRNA-pipeline-v2).

摘要

长链非编码RNA(lncRNAs)是植物众多生物学过程的关键调节因子。然而,由于不同物种间序列保守性较低,其鉴定具有挑战性。现有的lncRNA鉴定计算方法在跨多种植物物种进行推广时往往面临困难,这凸显了对更强大、更通用的鉴定模型的需求。在此,我们展示了PlantLncBoost,这是一种旨在提高植物lncRNA鉴定通用性的新型计算工具。通过将先进的梯度提升算法与全面的特征选择相结合,我们的方法实现了高精度和通用性。我们对1662个特征进行了广泛分析,并确定了三个关键特征——开放阅读框覆盖率、复傅里叶平均值和原子傅里叶幅度,它们能有效区分lncRNAs和mRNAs。我们使用来自20种植物物种的综合数据集评估了PlantLncBoost的性能。该模型表现出卓越的性能,准确率为96.63%,灵敏度为98.42%,特异性为94.93%,显著优于现有工具。进一步分析表明,我们选择的特征有效地捕捉了多种植物物种中lncRNAs和mRNAs之间的差异。PlantLncBoost代表了植物lncRNA鉴定的重大进展。它可在GitHub(https://github.com/xuechantian/PlantLncBoost)上免费获取,并已集成到一个综合分析管道Plant-LncRNA-pipeline v.2(https://github.com/xuechantian/Plant-LncRNA-pipeline-v2)中。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验