Livesey Benjamin J, Marsh Joseph A
MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK.
Genome Biol. 2025 Apr 22;26(1):104. doi: 10.1186/s13059-025-03575-w.
Understanding the relationship between protein sequence and function is crucial for accurate classification of missense variants. Variant effect predictors (VEPs) play a vital role in deciphering this complex relationship, yet evaluating their performance remains challenging for several reasons, including data circularity, where the same or related data is used for training and assessment. High-throughput experimental strategies like deep mutational scanning (DMS) offer a promising solution.
In this study, we extend upon our previous benchmarking approach, assessing the performance of 97 VEPs using missense DMS measurements from 36 different human proteins. In addition, a new pairwise, VEP-centric approach mitigates the impact of missing predictions on overall performance comparison. We observe a strong correspondence between VEP performance in DMS-based benchmarks and clinical variant classification, especially for predictors that have not been directly trained on human clinical variants.
Our results suggest that comparing VEP performance against diverse functional assays represents a reliable strategy for assessing their relative performance in clinical variant classification. However, major challenges in clinical interpretation of VEP scores persist, highlighting the need for further research to fully leverage computational predictors for genetic diagnosis. We also address practical considerations for end users in terms of choice of methodology.
理解蛋白质序列与功能之间的关系对于错义变异的准确分类至关重要。变异效应预测器(VEP)在解读这种复杂关系中起着至关重要的作用,但由于包括数据循环(即相同或相关数据用于训练和评估)在内的多种原因,评估它们的性能仍然具有挑战性。像深度突变扫描(DMS)这样的高通量实验策略提供了一个有前景的解决方案。
在本研究中,我们扩展了之前的基准测试方法,使用来自36种不同人类蛋白质的错义DMS测量值评估了97种VEP的性能。此外,一种新的以VEP为中心的成对方法减轻了缺失预测对整体性能比较的影响。我们观察到基于DMS的基准测试中VEP性能与临床变异分类之间有很强的对应关系,特别是对于未直接在人类临床变异上进行训练的预测器。
我们的结果表明,将VEP性能与多种功能测定进行比较是评估它们在临床变异分类中相对性能的可靠策略。然而,VEP分数临床解释中的主要挑战仍然存在,这突出了进一步研究以充分利用计算预测器进行基因诊断的必要性。我们还从方法选择的角度讨论了终端用户的实际考虑因素。