Suppr超能文献

TransAC4C——一种用于以单碱基分辨率对RNA中N4-乙酰胞苷位点进行多物种识别的新型可解释架构。

TransAC4C-a novel interpretable architecture for multi-species identification of N4-acetylcytidine sites in RNA with single-base resolution.

作者信息

Liu Ruijie, Zhang Yuanpeng, Wang Qi, Zhang Xiaoping

机构信息

Department of Urology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430000, China.

Shenzhen Huazhong University of Science and Technology Research Institute, Shenzhen, 518000, China.

出版信息

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae200.

Abstract

N4-acetylcytidine (ac4C) is a modification found in ribonucleic acid (RNA) related to diseases. Expensive and labor-intensive methods hindered the exploration of ac4C mechanisms and the development of specific anti-ac4C drugs. Therefore, an advanced prediction model for ac4C in RNA is urgently needed. Despite the construction of various prediction models, several limitations exist: (1) insufficient resolution at base level for ac4C sites; (2) lack of information on species other than Homo sapiens; (3) lack of information on RNA other than mRNA; and (4) lack of interpretation for each prediction. In light of these limitations, we have reconstructed the previous benchmark dataset and introduced a new dataset including balanced RNA sequences from multiple species and RNA types, while also providing base-level resolution for ac4C sites. Additionally, we have proposed a novel transformer-based architecture and pipeline for predicting ac4C sites, allowing for highly accurate predictions, visually interpretable results and no restrictions on the length of input RNA sequences. Statistically, our work has improved the accuracy of predicting specific ac4C sites in multiple species from less than 40% to around 85%, achieving a high AUC > 0.9. These results significantly surpass the performance of all existing models.

摘要

N4-乙酰胞苷(ac4C)是一种在与疾病相关的核糖核酸(RNA)中发现的修饰。昂贵且 labor-intensive 的方法阻碍了对 ac4C 机制的探索以及特异性抗 ac4C 药物的开发。因此,迫切需要一种先进的 RNA 中 ac4C 的预测模型。尽管构建了各种预测模型,但仍存在一些局限性:(1)对 ac4C 位点的碱基水平分辨率不足;(2)缺乏除智人以外其他物种的信息;(3)缺乏除 mRNA 以外其他 RNA 的信息;以及(4)对每个预测缺乏解释。鉴于这些局限性,我们重建了先前的基准数据集,并引入了一个新数据集,该数据集包括来自多个物种和 RNA 类型的平衡 RNA 序列,同时还为 ac4C 位点提供了碱基水平的分辨率。此外,我们提出了一种基于新型变压器的架构和管道来预测 ac4C 位点,实现了高精度预测、可视化可解释的结果,并且对输入 RNA 序列的长度没有限制。从统计学上讲,我们的工作将多个物种中预测特定 ac4C 位点的准确率从不到 40%提高到了约 85%,实现了大于 0.9 的高 AUC。这些结果显著超过了所有现有模型的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aaf8/11066922/179d7ac10de5/bbae200f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验