Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel.
Morris Kahn Laboratory of Human Genetics and the Genetics Institute at Soroka Medical Center, Faculty of Health Sciences, Ben Gurion University of the Negev, Beer Sheva, 84105, Israel.
Mol Syst Biol. 2024 Nov;20(11):1187-1206. doi: 10.1038/s44320-024-00061-6. Epub 2024 Sep 16.
Pathogenic variants underlying Mendelian diseases often disrupt the normal physiology of a few tissues and organs. However, variant effect prediction tools that aim to identify pathogenic variants are typically oblivious to tissue contexts. Here we report a machine-learning framework, denoted "Tissue Risk Assessment of Causality by Expression for variants" (TRACEvar, https://netbio.bgu.ac.il/TRACEvar/ ), that offers two advancements. First, TRACEvar predicts pathogenic variants that disrupt the normal physiology of specific tissues. This was achieved by creating 14 tissue-specific models that were trained on over 14,000 variants and combined 84 attributes of genetic variants with 495 attributes derived from tissue omics. TRACEvar outperformed 10 well-established and tissue-oblivious variant effect prediction tools. Second, the resulting models are interpretable, thereby illuminating variants' mode of action. Application of TRACEvar to variants of 52 rare-disease patients highlighted pathogenicity mechanisms and relevant disease processes. Lastly, the interpretation of all tissue models revealed that top-ranking determinants of pathogenicity included attributes of disease-affected tissues, particularly cellular process activities. Collectively, these results show that tissue contexts and interpretable machine-learning models can greatly enhance the etiology of rare diseases.
孟德尔疾病的致病变异通常会破坏少数组织和器官的正常生理功能。然而,旨在识别致病变异的变异效应预测工具通常忽略了组织背景。在这里,我们报告了一个机器学习框架,称为“通过表达对变体进行因果关系的组织风险评估”(TRACEvar,https://netbio.bgu.ac.il/TRACEvar/),它提供了两个改进。首先,TRACEvar 预测破坏特定组织正常生理功能的致病变异。这是通过创建 14 个组织特异性模型来实现的,这些模型在超过 14000 个变体上进行了训练,并将遗传变体的 84 个属性与来自组织组学的 495 个属性相结合。TRACEvar 优于 10 种成熟且无视组织的变异效应预测工具。其次,所得到的模型是可解释的,从而阐明了变体的作用机制。将 TRACEvar 应用于 52 名罕见疾病患者的变体突出了发病机制和相关疾病过程。最后,对所有组织模型的解释表明,致病性的主要决定因素包括受疾病影响的组织的属性,特别是细胞过程活动。总之,这些结果表明组织背景和可解释的机器学习模型可以极大地增强罕见疾病的病因学。