Zhao Yige, Lan Tian, Zhong Guojie, Hagen Jake, Pan Hongbing, Chung Wendy K, Shen Yufeng
Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032.
The Integrated Program in Cellular, Molecular, and Biomedical Studies, Columbia University, New York, NY 10032.
medRxiv. 2025 Apr 11:2023.12.11.23299809. doi: 10.1101/2023.12.11.23299809.
Accurately predicting the effect of missense variants is important in discovering disease risk genes and clinical genetic diagnostics. Commonly used computational methods predict pathogenicity, which does not capture the quantitative impact on fitness in humans. We developed a method, MisFit, to estimate missense fitness effect using a graphical model. MisFit jointly models the effect at a molecular level (𝑑) and a population level (selection coefficient, 𝑠), assuming that in the same gene, missense variants with similar 𝑑 have similar 𝑠. We trained it by maximizing probability of observed allele counts in 236,017 European individuals. We show that 𝑠 is informative in predicting allele frequency across ancestries and consistent with the fraction of mutations in sites under strong selection. Further, 𝑠 outperforms previous methods in prioritizing missense variants in individuals with neurodevelopmental disorders. In conclusion, MisFit accurately predicts 𝑠 and yields new insights from genomic data.
准确预测错义变异的影响对于发现疾病风险基因和临床基因诊断至关重要。常用的计算方法预测致病性,但无法捕捉其对人类适应性的定量影响。我们开发了一种名为MisFit的方法,使用图形模型来估计错义变异对适应性的影响。MisFit联合对分子水平的影响(𝑑)和群体水平的影响(选择系数,𝑠)进行建模,假设在同一基因中,具有相似𝑑的错义变异具有相似的𝑠。我们通过最大化236,017名欧洲个体中观察到的等位基因计数的概率对其进行训练。我们表明,𝑠在预测不同祖先群体的等位基因频率方面具有信息价值,并且与强选择位点的突变比例一致。此外,在对神经发育障碍个体中的错义变异进行优先级排序时,𝑠优于先前的方法。总之,MisFit能够准确预测𝑠,并从基因组数据中获得新的见解。