Ozkan Selen, Padilla Natàlia, de la Cruz Xavier
Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain.
Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
Hum Genet. 2025 Mar;144(2-3):191-208. doi: 10.1007/s00439-024-02692-z. Epub 2024 Jul 24.
Next-generation sequencing (NGS) has revolutionized genetic diagnostics, yet its application in precision medicine remains incomplete, despite significant advances in computational tools for variant annotation. Many variants remain unannotated, and existing tools often fail to accurately predict the range of impacts that variants have on protein function. This limitation restricts their utility in relevant applications such as predicting disease severity and onset age. In response to these challenges, a new generation of computational models is emerging, aimed at producing quantitative predictions of genetic variant impacts. However, the field is still in its early stages, and several issues need to be addressed, including improved performance and better interpretability. This study introduces QAFI, a novel methodology that integrates protein-specific regression models within an ensemble learning framework, utilizing conservation-based and structure-related features derived from AlphaFold models. Our findings indicate that QAFI significantly enhances the accuracy of quantitative predictions across various proteins. The approach has been rigorously validated through its application in the CAGI6 contest, focusing on ARSA protein variants, and further tested on a comprehensive set of clinically labeled variants, demonstrating its generalizability and robust predictive power. The straightforward nature of our models may also contribute to better interpretability of the results.
下一代测序(NGS)彻底改变了基因诊断,然而,尽管在变异注释的计算工具方面取得了重大进展,但其在精准医学中的应用仍不完整。许多变异仍未得到注释,现有工具往往无法准确预测变异对蛋白质功能的影响范围。这一局限性限制了它们在预测疾病严重程度和发病年龄等相关应用中的效用。为应对这些挑战,新一代计算模型正在涌现,旨在对基因变异影响进行定量预测。然而,该领域仍处于早期阶段,有几个问题需要解决,包括提高性能和更好的可解释性。本研究介绍了QAFI,这是一种在集成学习框架内整合蛋白质特异性回归模型的新方法,利用从AlphaFold模型派生的基于保守性和结构相关的特征。我们的研究结果表明,QAFI显著提高了对各种蛋白质定量预测的准确性。该方法已通过在CAGI6竞赛中针对ARSA蛋白变异的应用进行了严格验证,并在一组全面的临床标记变异上进行了进一步测试,证明了其通用性和强大的预测能力。我们模型的直观性质也可能有助于更好地解释结果。