Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany.
Human Molecular Genetics Group, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany.
Am J Hum Genet. 2024 Feb 1;111(2):338-349. doi: 10.1016/j.ajhg.2023.12.011. Epub 2024 Jan 15.
Clinical exome and genome sequencing have revolutionized the understanding of human disease genetics. Yet many genes remain functionally uncharacterized, complicating the establishment of causal disease links for genetic variants. While several scoring methods have been devised to prioritize these candidate genes, these methods fall short of capturing the expression heterogeneity across cell subpopulations within tissues. Here, we introduce single-cell tissue-specific gene prioritization using machine learning (STIGMA), an approach that leverages single-cell RNA-seq (scRNA-seq) data to prioritize candidate genes associated with rare congenital diseases. STIGMA prioritizes genes by learning the temporal dynamics of gene expression across cell types during healthy organogenesis. To assess the efficacy of our framework, we applied STIGMA to mouse limb and human fetal heart scRNA-seq datasets. In a cohort of individuals with congenital limb malformation, STIGMA prioritized 469 variants in 345 genes, with UBA2 as a notable example. For congenital heart defects, we detected 34 genes harboring nonsynonymous de novo variants (nsDNVs) in two or more individuals from a set of 7,958 individuals, including the ortholog of Prdm1, which is associated with hypoplastic left ventricle and hypoplastic aortic arch. Overall, our findings demonstrate that STIGMA effectively prioritizes tissue-specific candidate genes by utilizing single-cell transcriptome data. The ability to capture the heterogeneity of gene expression across cell populations makes STIGMA a powerful tool for the discovery of disease-associated genes and facilitates the identification of causal variants underlying human genetic disorders.
临床外显子组和基因组测序彻底改变了人们对人类疾病遗传学的理解。然而,许多基因仍然功能未知,这使得确定遗传变异与疾病之间的因果关系变得复杂。虽然已经设计了几种评分方法来优先考虑这些候选基因,但这些方法无法捕捉组织内细胞亚群的表达异质性。在这里,我们引入了使用机器学习(STIGMA)进行单细胞组织特异性基因优先级排序的方法,该方法利用单细胞 RNA-seq(scRNA-seq)数据优先考虑与罕见先天性疾病相关的候选基因。STIGMA 通过学习健康器官发生过程中细胞类型之间基因表达的时间动态来优先考虑基因。为了评估我们框架的功效,我们将 STIGMA 应用于小鼠肢体和人类胎儿心脏 scRNA-seq 数据集。在一组患有先天性肢体畸形的个体中,STIGMA 优先考虑了 345 个基因中的 469 个变体,其中 UBA2 是一个值得注意的例子。对于先天性心脏缺陷,我们在 7958 个个体的一组中检测到 34 个基因携带非同义从头变异(nsDNV),其中包括 Prdm1 的同源物,该基因与左心室发育不全和主动脉弓发育不全有关。总的来说,我们的研究结果表明,STIGMA 通过利用单细胞转录组数据有效地优先考虑组织特异性候选基因。捕捉细胞群体中基因表达异质性的能力使 STIGMA 成为发现疾病相关基因的强大工具,并有助于确定人类遗传疾病的因果变异。