Puente-Lelievre Caroline, Malik Ashar, Douglas Jordan
School of Biological Sciences, The University of Auckland, Auckland, New Zealand.
Centre for Computational Evolution, The University of Auckland, Auckland, New Zealand.
Genome Biol Evol. 2025 Jul 30;17(8). doi: 10.1093/gbe/evaf139.
Protein structural phylogenetics is an interdisciplinary branch of molecular evolution that (i) uses 3D structural data to trace evolutionary histories, and (ii) uses these evolutionary relationships to explore the diversity of protein structures and their ancestral functions. The appeal in extracting phylogenetic information from protein structure lies in the greater conservation of protein structure compared with sequence, reflecting its resilience to mutation over long evolutionary timescales. Leveraging this information is particularly useful for examining relationships within the "twilight zone"-a region of low protein sequence similarity where it becomes challenging to resolve noise from signal. Historically, the field has been constrained by the limited availability of high-resolution structural data. However, recent breakthroughs in artificial intelligence have made high-quality protein structural data widely accessible. Although the methods for constructing phylogenetic trees from protein structures have progressed significantly from distance-based approaches used since the 1970s, this area of research still lags behind the advanced probabilistic models employed in sequence-based phylogenetics; particularly Bayesian and maximum likelihood approaches. This article reviews the current state of protein structural phylogenetics, outlines methods for extracting evolutionary insights from structural data, and highlights key applications and future directions. Due to the surge of newly available structural information, it is anticipated that sequence and structural data will become routinely integrated in phylogenetic analysis; poising us to venture further into the twilight zone and form cross-disciplinary and translational collaborations.
蛋白质结构系统发育学是分子进化的一个跨学科分支,它(i)利用三维结构数据追溯进化历史,(ii)利用这些进化关系探索蛋白质结构的多样性及其祖先功能。从蛋白质结构中提取系统发育信息的吸引力在于,与序列相比,蛋白质结构具有更高的保守性,这反映了其在漫长进化时间尺度上对突变的抗性。利用这些信息对于研究“模糊区域”内的关系特别有用,在这个低蛋白质序列相似性区域,区分信号和噪声变得具有挑战性。从历史上看,该领域一直受到高分辨率结构数据可用性有限的限制。然而,人工智能最近的突破使得高质量的蛋白质结构数据广泛可得。尽管从蛋白质结构构建系统发育树的方法已经从20世纪70年代以来使用的基于距离的方法有了显著进展,但该研究领域仍落后于基于序列的系统发育学中使用的先进概率模型;特别是贝叶斯方法和最大似然方法。本文综述了蛋白质结构系统发育学的现状,概述了从结构数据中提取进化见解的方法,并突出了关键应用和未来方向。由于新获得的结构信息激增,预计序列和结构数据将在系统发育分析中常规整合;使我们能够进一步深入模糊区域,并形成跨学科和转化合作。