Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey.
Network Technologies Department, TÜBİTAK-ULAKBİM Turkish Academic Network and Information Center, Ankara 06530, Turkey.
Mol Biol Evol. 2024 Jul 3;41(7). doi: 10.1093/molbev/msae136.
Most algorithms that are used to predict the effects of variants rely on evolutionary conservation. However, a majority of such techniques compute evolutionary conservation by solely using the alignment of multiple sequences while overlooking the evolutionary context of substitution events. We had introduced PHACT, a scoring-based pathogenicity predictor for missense mutations that can leverage phylogenetic trees, in our previous study. By building on this foundation, we now propose PHACTboost, a gradient boosting tree-based classifier that combines PHACT scores with information from multiple sequence alignments, phylogenetic trees, and ancestral reconstruction. By learning from data, PHACTboost outperforms PHACT. Furthermore, the results of comprehensive experiments on carefully constructed sets of variants demonstrated that PHACTboost can outperform 40 prevalent pathogenicity predictors reported in the dbNSFP, including conventional tools, metapredictors, and deep learning-based approaches as well as more recent tools such as AlphaMissense, EVE, and CPT-1. The superiority of PHACTboost over these methods was particularly evident in case of hard variants for which different pathogenicity predictors offered conflicting results. We provide predictions of 215 million amino acid alterations over 20,191 proteins. PHACTboost is available at https://github.com/CompGenomeLab/PHACTboost. PHACTboost can improve our understanding of genetic diseases and facilitate more accurate diagnoses.
大多数用于预测变异影响的算法都依赖于进化保守性。然而,大多数此类技术仅通过使用多个序列的比对来计算进化保守性,而忽略了替代事件的进化背景。在我们之前的研究中,我们引入了 PHACT,这是一种基于评分的错义突变致病性预测器,可以利用系统发生树。在此基础上,我们现在提出了 PHACTboost,这是一种基于梯度提升树的分类器,它将 PHACT 得分与来自多序列比对、系统发生树和祖先重建的信息结合在一起。通过从数据中学习,PHACTboost 优于 PHACT。此外,在精心构建的变异数据集上进行的综合实验结果表明,PHACTboost 可以优于 dbNSFP 中报告的 40 种流行的致病性预测器,包括传统工具、元预测器和基于深度学习的方法,以及最近的工具,如 AlphaMissense、EVE 和 CPT-1。在硬变体的情况下,PHACTboost 优于这些方法的优势尤为明显,不同的致病性预测器提供了相互矛盾的结果。我们提供了 20191 种蛋白质中 21500 万种氨基酸变化的预测。PHACTboost 可在 https://github.com/CompGenomeLab/PHACTboost 上获得。PHACTboost 可以帮助我们更好地了解遗传疾病,并促进更准确的诊断。