Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada.
Center for Human Genetics, University Hospitals Leuven, Leuven, Belgium.
Genome Med. 2024 Oct 15;16(1):119. doi: 10.1186/s13073-024-01383-8.
Congenital heart disease (CHD) is the most common congenital anomaly. Almost 90% of isolated cases have an unexplained genetic etiology after clinical testing. Non-canonical splice variants that disrupt mRNA splicing through the loss or creation of exon boundaries are not routinely captured and/or evaluated by standard clinical genetic tests. Recent computational algorithms such as SpliceAI have shown an ability to predict such variants, but are not specific to cardiac-expressed genes and transcriptional isoforms.
We used genome sequencing (GS) (n = 1101 CHD probands) and myocardial RNA-Sequencing (RNA-Seq) (n = 154 CHD and n = 43 cardiomyopathy probands) to identify and validate splice disrupting variants, and to develop a heart-specific model for canonical and non-canonical splice variants that can be applied to patients with CHD and cardiomyopathy. Two thousand five hundred seventy GS samples from the Medical Genome Reference Bank were analyzed as healthy controls.
Of 8583 rare DNA splice-disrupting variants initially identified using SpliceAI, 100 were associated with altered splice junctions in the corresponding patient myocardium affecting 95 genes. Using strength of myocardial gene expression and genome-wide DNA variant features that were confirmed to affect splicing in myocardial RNA, we trained a machine learning model for predicting cardiac-specific splice-disrupting variants (AUC 0.86 on internal validation). In a validation set of 48 CHD probands, the cardiac-specific model outperformed a SpliceAI model alone (AUC 0.94 vs 0.67 respectively). Application of this model to an additional 947 CHD probands with only GS data identified 1% patients with canonical and 11% patients with non-canonical splice-disrupting variants in CHD genes. Forty-nine percent of predicted splice-disrupting variants were intronic and > 10 bp from existing splice junctions. The burden of high-confidence splice-disrupting variants in CHD genes was 1.28-fold higher in CHD cases compared with healthy controls.
A new cardiac-specific in silico model was developed using complementary GS and RNA-Seq data that improved genetic yield by identifying a significant burden of non-canonical splice variants associated with CHD that would not be detectable through panel or exome sequencing.
先天性心脏病(CHD)是最常见的先天性异常。在经过临床检测后,近 90%的孤立病例都存在无法解释的遗传病因。通过丢失或创建外显子边界而破坏 mRNA 剪接的非规范剪接变体通常不会被标准临床遗传检测捕获和/或评估。最近的计算算法(如 SpliceAI)已显示出预测此类变体的能力,但不适用于心脏表达基因和转录异构体。
我们使用基因组测序(GS)(n=1101 例 CHD 先证者)和心肌 RNA 测序(RNA-Seq)(n=154 例 CHD 和 n=43 例心肌病先证者)来鉴定和验证剪接破坏变体,并开发一种适用于 CHD 和心肌病患者的心脏特异性模型,用于鉴定规范和非规范剪接变体。从医学基因组参考库中分析了 2577 个 GS 样本作为健康对照。
使用 SpliceAI 最初鉴定了 8583 个罕见的 DNA 剪接破坏变体,其中 100 个与相应患者心肌中的改变剪接连接相关,影响了 95 个基因。使用经过验证可影响心肌 RNA 剪接的心肌基因表达强度和全基因组 DNA 变体特征,我们训练了一种用于预测心脏特异性剪接破坏变体的机器学习模型(内部验证的 AUC 为 0.86)。在 48 例 CHD 先证者的验证集中,心脏特异性模型的表现优于单独的 SpliceAI 模型(AUC 分别为 0.94 和 0.67)。将该模型应用于仅具有 GS 数据的另外 947 例 CHD 先证者中,鉴定出 1%的 CHD 基因中存在规范剪接和 11%的非规范剪接破坏变体。预测的剪接破坏变体中 49%为内含子,且距离现有剪接连接>10bp。与健康对照组相比,CHD 基因中高可信度剪接破坏变体的负担增加了 1.28 倍。
使用互补的 GS 和 RNA-Seq 数据开发了一种新的心脏特异性计算模型,通过鉴定与 CHD 相关的大量非规范剪接变体,提高了遗传检测效率,这些变体无法通过面板或外显子组测序检测到。