Lauer Larissa, Rivas Manuel A
Department of Statistics, Stanford, CA, USA, 94305.
Department of Biomedical Data Science, Stanford University, Stanford, CA, USA 94305.
bioRxiv. 2025 Jan 24:2025.01.23.634522. doi: 10.1101/2025.01.23.634522.
Rare variant association studies (RVAS) of complex traits have emerged as a powerful approach to advance drug discovery and diagnostics. Missense pathogenicity predictions from AlphaMissense based on structural context and protein language models improve the differentiation between benign and deleterious variants. Constraint metrics, on the other hand, allow researchers to pinpoint genomic regions under selective pressure that may not directly impact protein structure, but are more likely to contain functionally important mutations. Loss-of-function (LoF) variants, which result in the complete or partial loss of protein function, are particularly informative, as it is more straightforward to assess their downstream functional consequences. In this study, we present a unified meta regression model approach that incorporates the probability of pathogenicity, probability of constraint, and indicator whether a variant is a predicted loss-of-function or missense variant as features to model the observed effect size and uncertainty of effect size obtained from single-variant genetic analysis. We applied the unified meta regression model to 1,144 continuous phenotypes from UK Biobank using single variant summary statistics obtained from Genebass. We replicated our findings using the AllofUS cohort. For each gene discovery, we make available a characterization of whether constrained sites are associated with the phenotype, whether pathogenic sites determined by structural based predictions are associated with phenotype, and whether broader loss-of-function or missense variant annotation better explains the summary statistics observed. Our results are publicly available at Global Biobank Engine (https://biobankengine.shinyapps.io/phenome-wide-unified-model/).
复杂性状的罕见变异关联研究(RVAS)已成为推动药物发现和诊断的有力方法。基于结构背景和蛋白质语言模型的AlphaMissense对错义致病性的预测,改善了良性和有害变异之间的区分。另一方面,约束指标使研究人员能够确定处于选择压力下的基因组区域,这些区域可能不会直接影响蛋白质结构,但更有可能包含功能重要的突变。功能丧失(LoF)变异会导致蛋白质功能完全或部分丧失,因其下游功能后果更易于评估,所以特别具有信息量。在本研究中,我们提出了一种统一的元回归模型方法,该方法将致病性概率、约束概率以及一个变异是预测的功能丧失变异还是错义变异的指标作为特征,来对从单变异基因分析中获得的观察效应大小和效应大小的不确定性进行建模。我们使用从Genebass获得的单变异汇总统计数据,将统一的元回归模型应用于英国生物银行的1144个连续性状。我们使用AllofUS队列重复了我们的发现。对于每个基因发现,我们提供了关于受约束位点是否与该性状相关、基于结构预测确定的致病位点是否与性状相关,以及更广泛的功能丧失或错义变异注释是否能更好地解释观察到的汇总统计数据的特征描述。我们的结果可在全球生物银行引擎(https://biobankengine.shinyapps.io/phenome-wide-unified-model/)上公开获取。