Suppr超能文献

变压器显著提高了剪接位点预测能力。

Transformers significantly improve splice site prediction.

作者信息

Jónsson Benedikt A, Halldórsson Gísli H, Árdal Steinþór, Rögnvaldsson Sölvi, Einarsson Eyþór, Sulem Patrick, Guðbjartsson Daníel F, Melsted Páll, Stefánsson Kári, Úlfarsson Magnús Ö

机构信息

deCODE Genetics/Amgen Inc., Reykjavik, Iceland.

University of Iceland, Reykjavik, Iceland.

出版信息

Commun Biol. 2024 Dec 4;7(1):1616. doi: 10.1038/s42003-024-07298-9.

Abstract

Mutations that affect RNA splicing significantly impact human diversity and disease. Here we present a method using transformers, a type of machine learning model, to detect splicing from raw 45,000-nucleotide sequences. We generate embeddings with residual neural networks and apply hard attention to select splice site candidates, enabling efficient training on long sequences. Our method surpasses the leading tool, SpliceAI, in detecting splice sites in GENCODE and ENSEMBL annotations. Using extensive RNA sequencing data from an Icelandic cohort of 17,848 individuals and the Genotype-Tissue Expression (GTEx) project, our method demonstrates superior performance in detecting splice junctions compared to SpliceAI-10k (PR-AUC = 0.834 vs. PR-AUC = 0.820) and is more effective at identifying disease-related splice variants in ClinVar (PR-AUC = 0.997 vs. PR-AUC = 0.996). These advancements hold promise for improving genetic research and clinical diagnostics, potentially leading to better understanding and treatment of splicing-related diseases.

摘要

影响RNA剪接的突变对人类多样性和疾病有重大影响。在此,我们提出一种使用变压器(一种机器学习模型)从45000个核苷酸的原始序列中检测剪接的方法。我们用残差神经网络生成嵌入,并应用硬注意力来选择剪接位点候选,从而能够对长序列进行高效训练。我们的方法在检测GENCODE和ENSEMBL注释中的剪接位点方面超越了领先工具SpliceAI。利用来自冰岛17848名个体队列的大量RNA测序数据以及基因型-组织表达(GTEx)项目,我们的方法在检测剪接连接方面比SpliceAI-10k表现更优(PR-AUC = 0.834 vs. PR-AUC = 0.820),并且在ClinVar中识别与疾病相关的剪接变体方面更有效(PR-AUC = 0.997 vs. PR-AUC = 0.996)。这些进展有望改善基因研究和临床诊断,可能有助于更好地理解和治疗与剪接相关的疾病。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a14/11618611/a99070cd342c/42003_2024_7298_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验