Suppr超能文献

OpenSpliceAI:SpliceAI的一种高效、模块化实现,便于在非人类物种上进行重新训练。

OpenSpliceAI: An efficient, modular implementation of SpliceAI enabling easy retraining on non-human species.

作者信息

Chao Kuan-Hao, Mao Alan, Liu Anqi, Salzberg Steven L, Pertea Mihaela

机构信息

Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA.

Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA.

出版信息

bioRxiv. 2025 Mar 23:2025.03.20.644351. doi: 10.1101/2025.03.20.644351.

Abstract

The SpliceAI deep learning system is currently one of the most accurate methods for identifying splicing signals directly from DNA sequences. However, its utility is limited by its reliance on older software frameworks and human-centric training data. Here we introduce OpenSpliceAI, a trainable, open-source version of SpliceAI implemented in PyTorch to address these challenges. OpenSpliceAI supports both training from scratch and transfer learning, enabling seamless re-training on species-specific datasets and mitigating human-centric biases. Our experiments show that it achieves faster processing speeds and lower memory usage than the original SpliceAI code, allowing large-scale analyses of extensive genomic regions on a single GPU. Additionally, OpenSpliceAI's flexible architecture makes for easier integration with established machine learning ecosystems, simplifying the development of custom splicing models for different species and applications. We demonstrate that OpenSpliceAI's output is highly concordant with SpliceAI. mutagenesis (ISM) analyses confirm that both models rely on similar sequence features, and calibration experiments demonstrate similar score probability estimates.

摘要

SpliceAI深度学习系统是目前直接从DNA序列中识别剪接信号最准确的方法之一。然而,其效用受到对旧软件框架和以人类为中心的训练数据的依赖的限制。在此,我们引入OpenSpliceAI,这是一个在PyTorch中实现的可训练的、开源版本的SpliceAI,以应对这些挑战。OpenSpliceAI支持从头开始训练和迁移学习,能够在特定物种的数据集上无缝重新训练,并减轻以人类为中心的偏差。我们的实验表明,它比原始的SpliceAI代码实现了更快的处理速度和更低的内存使用,允许在单个GPU上对广泛的基因组区域进行大规模分析。此外,OpenSpliceAI灵活的架构使其更易于与既定的机器学习生态系统集成,简化了针对不同物种和应用的定制剪接模型的开发。我们证明OpenSpliceAI的输出与SpliceAI高度一致。诱变(ISM)分析证实,这两个模型都依赖于相似的序列特征,校准实验表明得分概率估计相似。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afaa/11957165/8d4b0c1e91e1/nihpp-2025.03.20.644351v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验