OpenSpliceAI：SpliceAI的一种高效、模块化实现，便于在非人类物种上进行重新训练。

OpenSpliceAI: An efficient, modular implementation of SpliceAI enabling easy retraining on non-human species.

作者信息

Chao Kuan-Hao, Mao Alan, Liu Anqi, Salzberg Steven L, Pertea Mihaela

机构信息

Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA.

Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA.

出版信息

bioRxiv. 2025 Mar 23:2025.03.20.644351. doi: 10.1101/2025.03.20.644351.

DOI:10.1101/2025.03.20.644351

PMID:40166201

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11957165/

Abstract

The SpliceAI deep learning system is currently one of the most accurate methods for identifying splicing signals directly from DNA sequences. However, its utility is limited by its reliance on older software frameworks and human-centric training data. Here we introduce OpenSpliceAI, a trainable, open-source version of SpliceAI implemented in PyTorch to address these challenges. OpenSpliceAI supports both training from scratch and transfer learning, enabling seamless re-training on species-specific datasets and mitigating human-centric biases. Our experiments show that it achieves faster processing speeds and lower memory usage than the original SpliceAI code, allowing large-scale analyses of extensive genomic regions on a single GPU. Additionally, OpenSpliceAI's flexible architecture makes for easier integration with established machine learning ecosystems, simplifying the development of custom splicing models for different species and applications. We demonstrate that OpenSpliceAI's output is highly concordant with SpliceAI. mutagenesis (ISM) analyses confirm that both models rely on similar sequence features, and calibration experiments demonstrate similar score probability estimates.

摘要

SpliceAI深度学习系统是目前直接从DNA序列中识别剪接信号最准确的方法之一。然而，其效用受到对旧软件框架和以人类为中心的训练数据的依赖的限制。在此，我们引入OpenSpliceAI，这是一个在PyTorch中实现的可训练的、开源版本的SpliceAI，以应对这些挑战。OpenSpliceAI支持从头开始训练和迁移学习，能够在特定物种的数据集上无缝重新训练，并减轻以人类为中心的偏差。我们的实验表明，它比原始的SpliceAI代码实现了更快的处理速度和更低的内存使用，允许在单个GPU上对广泛的基因组区域进行大规模分析。此外，OpenSpliceAI灵活的架构使其更易于与既定的机器学习生态系统集成，简化了针对不同物种和应用的定制剪接模型的开发。我们证明OpenSpliceAI的输出与SpliceAI高度一致。诱变（ISM）分析证实，这两个模型都依赖于相似的序列特征，校准实验表明得分概率估计相似。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afaa/11957165/8d4b0c1e91e1/nihpp-2025.03.20.644351v1-f0001.jpg

相似文献

OpenSpliceAI: An efficient, modular implementation of SpliceAI enabling easy retraining on non-human species.OpenSpliceAI：SpliceAI的一种高效、模块化实现，便于在非人类物种上进行重新训练。

bioRxiv. 2025 Mar 23:2025.03.20.644351. doi: 10.1101/2025.03.20.644351.

CI-SpliceAI-Improving machine learning predictions of disease causing splicing variants using curated alternative splice sites.CI-SpliceAI-利用已注释的可变剪接位点来改进疾病相关剪接变异体的机器学习预测。

PLoS One. 2022 Jun 3;17(6):e0269159. doi: 10.1371/journal.pone.0269159. eCollection 2022.

Performance Evaluation of SpliceAI for the Prediction of Splicing of Variants.SpliceAI 预测变异剪接的性能评估。

Genes (Basel). 2021 Aug 25;12(9):1308. doi: 10.3390/genes12091308.

SpliceAI-visual: a free online tool to improve SpliceAI splicing variant interpretation.SpliceAI-visual：一个免费的在线工具，用于改善 SpliceAI 剪接变异体解释。

Hum Genomics. 2023 Feb 10;17(1):7. doi: 10.1186/s40246-023-00451-1.

Combining full-length gene assay and SpliceAI to interpret the splicing impact of all possible SPINK1 coding variants.结合全长基因检测和 SpliceAI 来解读所有可能的 SPINK1 编码变异对剪接的影响。

Hum Genomics. 2024 Feb 27;18(1):21. doi: 10.1186/s40246-024-00586-9.

DeePMD-kit v3: A Multiple-Backend Framework for Machine Learning Potentials.深度势能工具包v3：一种用于机器学习势能的多后端框架。

J Chem Theory Comput. 2025 May 13;21(9):4375-4385. doi: 10.1021/acs.jctc.5c00340. Epub 2025 May 2.

Exploring the role of splicing in TP53 variant pathogenicity through predictions and minigene assays.通过预测和小基因分析探索剪接在TP53变异致病性中的作用。

Hum Genomics. 2025 Jan 8;19(1):2. doi: 10.1186/s40246-024-00714-5.

ænet-PyTorch: A GPU-supported implementation for machine learning atomic potentials training.Anet-PyTorch：一个支持 GPU 的机器学习原子势训练实现。

J Chem Phys. 2023 Apr 28;158(16). doi: 10.1063/5.0146803.

Comparison of Tools for Splice-Altering Variant Prediction Using Established Spliceogenic Variants: An End-User's Point of View.使用已确定的剪接变异体进行剪接改变变异预测工具的比较：终端用户视角

Int J Genomics. 2022 Oct 13;2022:5265686. doi: 10.1155/2022/5265686. eCollection 2022.

PyMIC: A deep learning toolkit for annotation-efficient medical image segmentation.PyMIC：一个用于高效医学图像分割的深度学习工具包。

Comput Methods Programs Biomed. 2023 Apr;231:107398. doi: 10.1016/j.cmpb.2023.107398. Epub 2023 Feb 7.

本文引用的文献

Deep Learning Sequence Models for Transcriptional Regulation.深度学习序列模型在转录调控中的应用。

Annu Rev Genomics Hum Genet. 2024 Aug;25(1):105-122. doi: 10.1146/annurev-genom-021623-024727. Epub 2024 Aug 6.

Alternative splicing in neurodegenerative disease and the promise of RNA therapies.神经退行性疾病中的可变剪接与 RNA 疗法的前景。

Nat Rev Neurosci. 2023 Aug;24(8):457-473. doi: 10.1038/s41583-023-00717-6. Epub 2023 Jun 19.

Aberrant splicing prediction across human tissues.跨人类组织的异常剪接预测

Nat Genet. 2023 May;55(5):861-870. doi: 10.1038/s41588-023-01373-3. Epub 2023 May 4.

Cardiac splicing as a diagnostic and therapeutic target.心脏剪接作为诊断和治疗靶点。

Nat Rev Cardiol. 2023 Aug;20(8):517-530. doi: 10.1038/s41569-022-00828-0. Epub 2023 Jan 18.

The genetic and biochemical determinants of mRNA degradation rates in mammals.哺乳动物中 mRNA 降解速率的遗传和生化决定因素。

Genome Biol. 2022 Nov 23;23(1):245. doi: 10.1186/s13059-022-02811-x.

A joint NCBI and EMBL-EBI transcript set for clinical genomics and research.临床基因组学和研究用的 NCBI 和 EMBL-EBI 联合转录本集。

Nature. 2022 Apr;604(7905):310-315. doi: 10.1038/s41586-022-04558-8. Epub 2022 Apr 6.

Functional Impact and Regulation of Alternative Splicing in Mouse Heart Development and Disease.在小鼠心脏发育和疾病中，可变剪接的功能影响和调控。

J Cardiovasc Transl Res. 2022 Dec;15(6):1239-1255. doi: 10.1007/s12265-022-10244-x. Epub 2022 Mar 30.

Neurodegenerative diseases: a hotbed for splicing defects and the potential therapies.神经退行性疾病：剪接缺陷的温床和潜在的治疗方法。

Transl Neurodegener. 2021 May 20;10(1):16. doi: 10.1186/s40035-021-00240-7.

Identification of Deep-Intronic Splice Mutations in a Large Cohort of Patients With Inherited Retinal Diseases.一大群遗传性视网膜疾病患者中深度内含子剪接突变的鉴定

Front Genet. 2021 Mar 2;12:647400. doi: 10.3389/fgene.2021.647400. eCollection 2021.

Predicting 3D genome folding from DNA sequence with Akita.利用赤池信息准则预测 DNA 序列的三维基因组折叠

Nat Methods. 2020 Nov;17(11):1111-1117. doi: 10.1038/s41592-020-0958-x. Epub 2020 Oct 12.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

OpenSpliceAI：SpliceAI的一种高效、模块化实现，便于在非人类物种上进行重新训练。

OpenSpliceAI: An efficient, modular implementation of SpliceAI enabling easy retraining on non-human species.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献