MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
Dis Model Mech. 2022 Jun 1;15(6). doi: 10.1242/dmm.049510. Epub 2022 Jun 23.
Computational predictors of genetic variant effect have advanced rapidly in recent years. These programs provide clinical and research laboratories with a rapid and scalable method to assess the likely impacts of novel variants. However, it can be difficult to know to what extent we can trust their results. To benchmark their performance, predictors are often tested against large datasets of known pathogenic and benign variants. These benchmarking data may overlap with the data used to train some supervised predictors, which leads to data re-use or circularity, resulting in inflated performance estimates for those predictors. Furthermore, new predictors are usually found by their authors to be superior to all previous predictors, which suggests some degree of computational bias in their benchmarking. Large-scale functional assays known as deep mutational scans provide one possible solution to this problem, providing independent datasets of variant effect measurements. In this Review, we discuss some of the key advances in predictor methodology, current benchmarking strategies and how data derived from deep mutational scans can be used to overcome the issue of data circularity. We also discuss the ability of such functional assays to directly predict clinical impacts of mutations and how this might affect the future need for variant effect predictors.
近年来,计算预测遗传变异效应的方法取得了快速进展。这些程序为临床和研究实验室提供了一种快速和可扩展的方法,以评估新变异的可能影响。然而,我们很难知道我们可以在多大程度上信任他们的结果。为了对其性能进行基准测试,预测器通常会针对已知致病性和良性变异的大型数据集进行测试。这些基准数据集可能与一些受监督预测器所使用的数据重叠,这导致了数据的重复使用或循环,从而导致这些预测器的性能估计过高。此外,新的预测器通常被其作者发现优于所有以前的预测器,这表明它们在基准测试中存在一定程度的计算偏差。称为深度突变扫描的大规模功能测定提供了一种解决此问题的可能方法,提供了变异效应测量的独立数据集。在这篇综述中,我们讨论了预测器方法学、当前基准测试策略的一些关键进展,以及如何利用深度突变扫描获得的数据来克服数据循环的问题。我们还讨论了这种功能测定直接预测突变临床影响的能力,以及这将如何影响未来对变异效应预测器的需求。