Suppr超能文献

Nmix:一种基于多特征融合和集成学习的 2'-O-甲基化位点精确预测的混合深度学习模型。

Nmix: a hybrid deep learning model for precise prediction of 2'-O-methylation sites based on multi-feature fusion and ensemble learning.

机构信息

Department of Physics, School of Science, Tianjin University, No. 92 Weijin Road, Nankai District, Tianjin 300072, China.

Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, No. 92 Weijin Road, Nankai District, Tianjin 300072, China.

出版信息

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae601.

Abstract

RNA 2'-O-methylation (Nm) is a crucial post-transcriptional modification with significant biological implications. However, experimental identification of Nm sites is challenging and resource-intensive. While multiple computational tools have been developed to identify Nm sites, their predictive performance, particularly in terms of precision and generalization capability, remains deficient. We introduced Nmix, an advanced computational tool for precise prediction of Nm sites in human RNA. We constructed the largest, low-redundancy dataset of experimentally verified Nm sites and employed an innovative multi-feature fusion approach, combining one-hot, Z-curve and RNA secondary structure encoding. Nmix utilizes a meticulously designed hybrid deep learning architecture, integrating 1D/2D convolutional neural networks, self-attention mechanism and residual connection. We implemented asymmetric loss function and Bayesian optimization-based ensemble learning, substantially improving predictive performance on imbalanced datasets. Rigorous testing on two benchmark datasets revealed that Nmix significantly outperforms existing state-of-the-art methods across various metrics, particularly in precision, with average improvements of 33.1% and 60.0%, and Matthews correlation coefficient, with average improvements of 24.7% and 51.1%. Notably, Nmix demonstrated exceptional cross-species generalization capability, accurately predicting 93.8% of experimentally verified Nm sites in rat RNA. We also developed a user-friendly web server (https://tubic.org/Nm) and provided standalone prediction scripts to facilitate widespread adoption. We hope that by providing a more accurate and robust tool for Nm site prediction, we can contribute to advancing our understanding of Nm mechanisms and potentially benefit the prediction of other RNA modification sites.

摘要

RNA 2′-O-甲基化 (Nm) 是一种至关重要的转录后修饰,具有重要的生物学意义。然而,实验鉴定 Nm 位点具有挑战性且资源密集。虽然已经开发了多种计算工具来识别 Nm 位点,但它们的预测性能,尤其是在精度和泛化能力方面,仍然存在不足。我们引入了 Nmix,这是一种用于精确预测人类 RNA 中 Nm 位点的先进计算工具。我们构建了最大、低冗余的实验验证 Nm 位点数据集,并采用了创新的多特征融合方法,结合了独热编码、Z 曲线和 RNA 二级结构编码。Nmix 利用精心设计的混合深度学习架构,集成了 1D/2D 卷积神经网络、自注意力机制和残差连接。我们实现了非对称损失函数和基于贝叶斯优化的集成学习,极大地提高了不平衡数据集的预测性能。在两个基准数据集上的严格测试表明,Nmix 在各种指标上都显著优于现有的最先进方法,尤其是在精度方面,平均提高了 33.1%和 60.0%,以及 Matthews 相关系数,平均提高了 24.7%和 51.1%。值得注意的是,Nmix 表现出了出色的跨物种泛化能力,能够准确预测大鼠 RNA 中 93.8%的实验验证 Nm 位点。我们还开发了一个用户友好的网络服务器 (https://tubic.org/Nm),并提供了独立的预测脚本,以促进广泛采用。我们希望通过提供更准确、更强大的 Nm 位点预测工具,为深入了解 Nm 机制做出贡献,并可能有助于预测其他 RNA 修饰位点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/126b/11568878/de7f6237f011/bbae601f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验