Suppr超能文献

CI-SpliceAI-利用已注释的可变剪接位点来改进疾病相关剪接变异体的机器学习预测。

CI-SpliceAI-Improving machine learning predictions of disease causing splicing variants using curated alternative splice sites.

机构信息

School of Human Development and Health, Faculty of Medicine, University of Southampton, Hampshire, United Kingdom.

Vision, Learning and Control, Department of Electronics and Computer Science, Faculty of Engineering and Physical Sciences, University of Southampton, Hampshire, United Kingdom.

出版信息

PLoS One. 2022 Jun 3;17(6):e0269159. doi: 10.1371/journal.pone.0269159. eCollection 2022.

Abstract

BACKGROUND

It is estimated that up to 50% of all disease causing variants disrupt splicing. Due to its complexity, our ability to predict which variants disrupt splicing is limited, meaning missed diagnoses for patients. The emergence of machine learning for targeted medicine holds great potential to improve prediction of splice disrupting variants. The recently published SpliceAI algorithm utilises deep neural networks and has been reported to have a greater accuracy than other commonly used methods.

METHODS AND FINDINGS

The original SpliceAI was trained on splice sites included in primary isoforms combined with novel junctions observed in GTEx data, which might introduce noise and de-correlate the machine learning input with its output. Limiting the data to only validated and manual annotated primary and alternatively spliced GENCODE sites in training may improve predictive abilities. All of these gene isoforms were collapsed (aggregated into one pseudo-isoform) and the SpliceAI architecture was retrained (CI-SpliceAI). Predictive performance on a newly curated dataset of 1,316 functionally validated variants from the literature was compared with the original SpliceAI, alongside MMSplice, MaxEntScan, and SQUIRLS. Both SpliceAI algorithms outperformed the other methods, with the original SpliceAI achieving an accuracy of ∼91%, and CI-SpliceAI showing an improvement at ∼92% overall. Predictive accuracy increased in the majority of curated variants.

CONCLUSIONS

We show that including only manually annotated alternatively spliced sites in training data improves prediction of clinically relevant variants, and highlight avenues for further performance improvements.

摘要

背景

据估计,多达 50%的致病变异会破坏剪接。由于其复杂性,我们预测哪些变异会破坏剪接的能力有限,这意味着患者的诊断被遗漏。机器学习在靶向药物中的出现具有极大的潜力来提高对剪接破坏变异的预测能力。最近发表的 SpliceAI 算法利用深度神经网络,据报道其准确性高于其他常用方法。

方法和发现

原始的 SpliceAI 是在主要异构体中包含的剪接位点上进行训练的,同时结合了在 GTEx 数据中观察到的新型连接,这可能会引入噪声,并使机器学习的输入与其输出脱钩。在训练中仅限制使用经过验证和手动注释的主要和选择性剪接 GENCODE 位点的数据,可能会提高预测能力。所有这些基因异构体都被合并(聚合为一个伪异构体),并重新训练 SpliceAI 架构(CI-SpliceAI)。在新整理的文献中 1316 个功能验证变异的数据集上,与原始的 SpliceAI 以及 MMSplice、MaxEntScan 和 SQUIRLS 进行了预测性能比较。原始的 SpliceAI 和两种 SpliceAI 算法的表现都优于其他方法,原始的 SpliceAI 的准确率约为 91%,CI-SpliceAI 的总体准确率约为 92%。在大多数整理的变体中,预测准确性都有所提高。

结论

我们表明,在训练数据中仅包含手动注释的选择性剪接位点可提高对临床相关变异的预测能力,并强调了进一步提高性能的途径。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a9a/9165884/688f1ae676a6/pone.0269159.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验