Kuru Nurdan, Adebali Ogün
Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkiye.
Biological Sciences, TÜBİTAK Research Institute for Fundamental Sciences, Gebze 41470, Turkiye.
Mol Biol Evol. 2025 Jul 1;42(7). doi: 10.1093/molbev/msaf150.
The coevolution trends of amino acids within or between genes offer key insights into protein structure and function. Existing tools for uncovering coevolutionary signals primarily rely on multiple sequence alignments, often overlooking phylogenetic relatedness and shared evolutionary history. Here, we introduce PHACE, a phylogeny-aware coevolution algorithm that maps amino acid substitutions onto a phylogenetic tree to detect molecular coevolution. PHACE categorizes amino acids at each position into "tolerable" and "intolerable" groups, based on their independent recurrence across the tree, reflecting a position's tolerance to specific substitutions. Gaps are treated as a third character type, with only phylogenetically independent gap changes considered. The method computes substitution scores per branch by traversing the tree and quantifying probability differences across adjacent nodes for each group. To avoid artifacts from alignment errors, we apply a multiple sequence alignment-masking procedure. Compared to phylogeny-based methods (CAPS, CoMap) and state-of-the-art multiple sequence alignment-based approaches (DCA, GaussDCA, PSICOV, mutual information), PHACE shows significantly superior accuracy in identifying coevolving residue pairs, as measured by statistical metrics including Matthews correlation coefficient, area under the ROC curve, and F1 score. This performance stems from PHACE's explicit modeling of phylogenetic dependencies, often ignored in coevolution analyses.
基因内部或基因之间氨基酸的共同进化趋势为蛋白质结构和功能提供了关键见解。现有的用于揭示共同进化信号的工具主要依赖于多序列比对,常常忽略了系统发育相关性和共享的进化历史。在这里,我们介绍了PHACE,一种系统发育感知的共同进化算法,它将氨基酸替换映射到系统发育树上以检测分子共同进化。PHACE根据氨基酸在树上的独立重现情况,将每个位置的氨基酸分为“可容忍”和“不可容忍”组,反映了一个位置对特定替换的容忍度。空位被视为第三种字符类型,只考虑系统发育上独立的空位变化。该方法通过遍历树并量化每组相邻节点之间的概率差异来计算每个分支的替换分数。为了避免比对错误产生的伪影,我们应用了一种多序列比对屏蔽程序。与基于系统发育的方法(CAPS、CoMap)和基于多序列比对的最新方法(DCA、GaussDCA、PSICOV、互信息)相比,通过包括马修斯相关系数、ROC曲线下面积和F1分数在内的统计指标衡量,PHACE在识别共同进化的残基对方面显示出显著更高的准确性。这种性能源于PHACE对系统发育依赖性的明确建模,而这在共同进化分析中常常被忽略。