Zeng Tony, Spence Jeffrey P, Mostafavi Hakhamanesh, Pritchard Jonathan K
Department of Genetics, Stanford University, Stanford CA.
Department of Biology, Stanford University, Stanford CA.
bioRxiv. 2024 Apr 10:2023.05.19.541520. doi: 10.1101/2023.05.19.541520.
Measures of selective constraint on genes have been used for many applications including clinical interpretation of rare coding variants, disease gene discovery, and studies of genome evolution. However, widely-used metrics are severely underpowered at detecting constraint for the shortest ∼25% of genes, potentially causing important pathogenic mutations to be overlooked. We developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, . Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease, and other phenotypes, especially for short genes. Our new estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve estimation of many gene-level properties, such as rare variant burden or gene expression differences.
对基因的选择性约束测量已被用于许多应用,包括罕见编码变异的临床解读、疾病基因发现以及基因组进化研究。然而,广泛使用的指标在检测最短约25%的基因的约束方面能力严重不足,可能导致重要的致病突变被忽视。我们开发了一个框架,将群体遗传学模型与基于基因特征的机器学习相结合,以实现对一个可解释的约束指标的准确推断。我们的估计在对细胞必需性、人类疾病和其他表型重要的基因进行优先级排序方面优于现有指标,特别是对于短基因。我们对选择性约束的新估计在表征与人类疾病相关的基因方面应具有广泛的用途。最后,我们的推断框架GeneBayes提供了一个灵活的平台,可以改进对许多基因水平特性的估计,如罕见变异负担或基因表达差异。