Zeiberg Daniel, Tejura Malvika, McEwen Abbye E, Fayer Shawn, Pejaver Vikas, Rubin Alan F, Starita Lea M, Fowler Douglas M, O'Donnell-Luria Anne, Radivojac Predrag
Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA.
Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
bioRxiv. 2025 May 4:2025.04.29.651326. doi: 10.1101/2025.04.29.651326.
High-throughput functional assays measure the effects of variants on macromolecular function and can aid in reclassifying the rapidly growing number of variants of uncertain significance. Under the current clinical variant classification guidelines, using functional data as a line of evidence to assert pathogenicity relies on determining assay score thresholds that define variants as functionally normal or functionally abnormal. These thresholds are designed to maximize the separation of variants with known clinical effects (benign, pathogenic) and often incorporate expert opinion. However, this approach lacks the rigor of calibration, in which a variant's posterior probability of pathogenicity must be estimated from the raw experimental score and mapped to discrete evidence strengths. To build upon the existing guidelines, we introduce and evaluate a method for calibrating continuous high-throughput functional data as a line of evidence in clinical variant classification. Assay score distributions of synonymous variants and variants appearing in gnomAD for a given functional scoreset are jointly modeled with score distributions of known pathogenic and benign variants using a multi-sample skew normal mixture of distributions. This model is learned using a constrained expectation-maximization algorithm that provably preserves the monotonicity of pathogenicity posteriors and is subsequently used to calculate variant-specific evidence strengths for use in the clinic. Using 24 datasets from 14 genes, we first assess the model's ability to capture assay score distributions. We then demonstrate its potential impact on reclassifying variants by comparing the evidence strengths assigned at the variant-level with those assigned uniformly to all functionally normal and abnormal variants under the existing ClinGen guidelines. An improved classification of variants will directly improve the accuracy of genetic diagnosis and subsequent medical management for individuals affected by Mendelian disorders.
高通量功能测定可测量变异对大分子功能的影响,并有助于对数量迅速增长的意义未明变异进行重新分类。根据当前的临床变异分类指南,将功能数据作为确定致病性的证据之一,依赖于确定将变异定义为功能正常或功能异常的测定分数阈值。这些阈值旨在最大程度地区分具有已知临床效应(良性、致病性)的变异,并且通常纳入了专家意见。然而,这种方法缺乏校准的严谨性,在校准中,必须从原始实验分数估计变异的致病性后验概率,并将其映射到离散的证据强度。为了在现有指南的基础上进行改进,我们引入并评估了一种将连续高通量功能数据校准为临床变异分类证据的方法。对于给定的功能分数集,同义变异和gnomAD中出现的变异的测定分数分布,与已知致病性和良性变异的分数分布一起,使用分布的多样本偏态正态混合模型进行联合建模。该模型使用约束期望最大化算法进行学习,该算法可证明保留致病性后验的单调性,随后用于计算临床使用的变异特异性证据强度。使用来自14个基因的24个数据集,我们首先评估该模型捕获测定分数分布的能力。然后,我们通过比较在变异水平分配的证据强度与根据现有ClinGen指南统一分配给所有功能正常和异常变异的证据强度,来证明其对变异重新分类的潜在影响。改进变异分类将直接提高受孟德尔疾病影响个体的基因诊断准确性和后续医疗管理水平。