Xu Chencheng, Bao Suying, Chen Hao, Jiang Tao, Zhang Chaolin
Bioinformatics Division, BNRIST, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.
Present address: Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.
bioRxiv. 2024 Apr 8:2024.03.22.586363. doi: 10.1101/2024.03.22.586363.
Alternative splicing plays a crucial role in protein diversity and gene expression regulation in higher eukaryotes and mutations causing dysregulated splicing underlie a range of genetic diseases. Computational prediction of alternative splicing from genomic sequences not only provides insight into gene-regulatory mechanisms but also helps identify disease-causing mutations and drug targets. However, the current methods for the quantitative prediction of splice site usage still have limited accuracy. Here, we present DeltaSplice, a deep neural network model optimized to learn the impact of mutations on quantitative changes in alternative splicing from the comparative analysis of homologous genes. The model architecture enables DeltaSplice to perform "reference-informed prediction" by incorporating the known splice site usage of a reference gene sequence to improve its prediction on splicing-altering mutations. We benchmarked DeltaSplice and several other state-of-the-art methods on various prediction tasks, including evolutionary sequence divergence on lineage-specific splicing and splicing-altering mutations in human populations and neurodevelopmental disorders, and demonstrated that DeltaSplice outperformed consistently. DeltaSplice predicted ~15% of splicing quantitative trait loci (sQTLs) in the human brain as causal splicing-altering variants. It also predicted splicing-altering mutations outside the splice sites in a subset of patients affected by autism and other neurodevelopmental disorders, including 19 genes with recurrent splicing-altering mutations. Among the new candidate disease risk genes, is involved in mitochondria fusion, which is frequently disrupted in autism patients. Our work expanded the capacity of splicing models with potential applications in genetic diagnosis and the development of splicing-based precision medicine.
可变剪接在高等真核生物的蛋白质多样性和基因表达调控中起着关键作用,导致剪接失调的突变是一系列遗传疾病的基础。从基因组序列进行可变剪接的计算预测不仅有助于深入了解基因调控机制,还能帮助识别致病突变和药物靶点。然而,目前用于剪接位点使用定量预测的方法准确性仍然有限。在此,我们提出了DeltaSplice,这是一种经过优化的深度神经网络模型,旨在通过对同源基因的比较分析来学习突变对可变剪接定量变化的影响。该模型架构使DeltaSplice能够通过纳入参考基因序列的已知剪接位点使用情况来进行“参考信息预测”,从而提高其对剪接改变突变的预测能力。我们在各种预测任务上对DeltaSplice和其他几种先进方法进行了基准测试,包括谱系特异性剪接的进化序列分歧、人类群体中的剪接改变突变以及神经发育障碍,并证明DeltaSplice始终表现更优。DeltaSplice预测人类大脑中约15%的剪接定量性状位点(sQTL)为导致剪接改变的变异。它还在一部分受自闭症和其他神经发育障碍影响的患者中预测了剪接位点外的剪接改变突变,包括19个具有反复剪接改变突变的基因。在新的候选疾病风险基因中, 参与线粒体融合,这在自闭症患者中经常受到破坏。我们的工作扩展了剪接模型的能力,在基因诊断和基于剪接的精准医学发展中具有潜在应用。