Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan.
Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8568, Japan.
Syst Biol. 2020 Mar 1;69(2):265-279. doi: 10.1093/sysbio/syz049.
A protein superfamily contains distantly related proteins that have acquired diverse biological functions through a long evolutionary history. Phylogenetic analysis of the early evolution of protein superfamilies is a key challenge because existing phylogenetic methods show poor performance when protein sequences are too diverged to construct an informative multiple sequence alignment (MSA). Here, we propose the Graph Splitting (GS) method, which rapidly reconstructs a protein superfamily-scale phylogenetic tree using a graph-based approach. Evolutionary simulation showed that the GS method can accurately reconstruct phylogenetic trees and be robust to major problems in phylogenetic estimation, such as biased taxon sampling, heterogeneous evolutionary rates, and long-branch attraction when sequences are substantially diverge. Its application to an empirical data set of the triosephosphate isomerase (TIM)-barrel superfamily suggests rapid evolution of protein-mediated pyrimidine biosynthesis, likely taking place after the RNA world. Furthermore, the GS method can also substantially improve performance of widely used MSA methods by providing accurate guide trees.
蛋白质超家族包含远缘相关的蛋白质,它们通过漫长的进化历史获得了多样化的生物学功能。蛋白质超家族早期进化的系统发生分析是一个关键挑战,因为现有的系统发生方法在蛋白质序列差异太大以至于无法构建信息丰富的多序列比对 (MSA) 时表现不佳。在这里,我们提出了图分割 (GS) 方法,该方法使用基于图的方法快速重建蛋白质超家族规模的系统发生树。进化模拟表明,GS 方法可以准确重建系统发生树,并对系统发生估计中的主要问题具有鲁棒性,例如有偏的分类群采样、不均匀的进化率以及序列差异较大时的长枝吸引。将其应用于三磷酸甘油醛异构酶 (TIM)-桶状超家族的实证数据集表明,嘧啶生物合成的蛋白质介导的快速进化可能发生在 RNA 世界之后。此外,GS 方法还可以通过提供准确的指导树来显著提高广泛使用的 MSA 方法的性能。