Suppr超能文献

GAN 样本-ac4C:通过生成对抗网络和迁移学习增强 ac4C 位点预测。

GANSamples-ac4C: Enhancing ac4C site prediction via generative adversarial networks and transfer learning.

机构信息

Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, and College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.

College of Software, Jilin University, Changchun, Jilin, 130012, China.

出版信息

Anal Biochem. 2024 Jun;689:115495. doi: 10.1016/j.ab.2024.115495. Epub 2024 Feb 29.

Abstract

RNA modification, N4-acetylcytidine (ac4C), is enzymatically catalyzed by N-acetyltransferase 10 (NAT10) and plays an essential role across tRNA, rRNA, and mRNA. It influences various cellular functions, including mRNA stability and rRNA biosynthesis. Wet-lab detection of ac4C modification sites is highly resource-intensive and costly. Therefore, various machine learning and deep learning techniques have been employed for computational detection of ac4C modification sites. The known ac4C modification sites are limited for training an accurate and stable prediction model. This study introduces GANSamples-ac4C, a novel framework that synergizes transfer learning and generative adversarial network (GAN) to generate synthetic RNA sequences to train a better ac4C modification site prediction model. Comparative analysis reveals that GANSamples-ac4C outperforms existing state-of-the-art methods in identifying ac4C sites. Moreover, our result underscores the potential of synthetic data in mitigating the issue of data scarcity for biological sequence prediction tasks. Another major advantage of GANSamples-ac4C is its interpretable decision logic. Multi-faceted interpretability analyses detect key regions in the ac4C sequences influencing the discriminating decision between positive and negative samples, a pronounced enrichment of G in this region, and ac4C-associated motifs. These findings may offer novel insights for ac4C research. The GANSamples-ac4C framework and its source code are publicly accessible at http://www.healthinformaticslab.org/supp/.

摘要

RNA 修饰,N4-乙酰胞嘧啶(ac4C),由 N-乙酰转移酶 10(NAT10)酶促催化,在 tRNA、rRNA 和 mRNA 中发挥重要作用。它影响各种细胞功能,包括 mRNA 稳定性和 rRNA 生物合成。ac4C 修饰位点的湿实验检测需要大量的资源和成本。因此,已经采用了各种机器学习和深度学习技术来进行 ac4C 修饰位点的计算检测。已知的 ac4C 修饰位点数量有限,无法训练出准确且稳定的预测模型。本研究引入了 GANSamples-ac4C,这是一种将迁移学习和生成对抗网络(GAN)协同作用的新框架,用于生成合成 RNA 序列,以训练更好的 ac4C 修饰位点预测模型。对比分析表明,GANSamples-ac4C 在识别 ac4C 位点方面优于现有的最先进方法。此外,我们的结果强调了合成数据在缓解生物序列预测任务中数据稀缺问题方面的潜力。GANSamples-ac4C 的另一个主要优势是其可解释的决策逻辑。多方面的可解释性分析检测到 ac4C 序列中影响正样本和负样本之间区分决策的关键区域,该区域中 G 的丰度明显升高,以及与 ac4C 相关的基序。这些发现可能为 ac4C 研究提供新的见解。GANSamples-ac4C 框架及其源代码可在 http://www.healthinformaticslab.org/supp/ 上公开获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验