Infectious Disease Responses Laboratory, University of New South Wales, Sydney, NSW, Australia.
Methods Mol Biol. 2021;2212:1-15. doi: 10.1007/978-1-0716-0947-7_1.
A mass-based protein phylogeny method, known as phylonumerics, is described to build phylogenetic-like trees using a purpose-built MassTree algorithm. These trees are constructed from sets of numerical mass map data for each protein without the need for gene or protein sequences. Such trees have been shown to be highly congruent with conventional sequence-based trees and provide a reliable means to study the evolutionary history of organisms. Mutations determined from the differences in the mass of peptide pairs across different mass sets are computed by the algorithm and displayed at branch nodes across the tree. By definition, since the trees display a phylogeny representing expressed proteins, all mutations are non-synonymous. The frequency of these mutations and a mutation score based on a sum of these frequencies weighted based upon their position to the root of the tree are output. The algorithm also outputs lists of pairs of mutations separated along interconnected branches of the tree. Those which co-occur or which occur consecutively, or near consecutively, and that are separated by a distance less than the average distance for all mutation pairs, are putatively assigned to be epistatic pairs. These pairs are examined further with a focus on non-conservative substitutions given their importance in driving structural and functional change and protein and organismal evolution. The application of the method is demonstrated for the H3 hemagglutinin protein of type A human H3N2 strains of the influenza virus. The most frequent ancestral mutations within epistatic pairs occur within antigenic site domains while the descendant mutations occur either at other antigenic sites or elsewhere in the protein. Both predominate at reported glycosylation sites. The results for this protein further support a "small steps" evolutionary model for the influenza virus where non-conservative mutations that involve the least structural change are favored over those involving substantive change, which may risk the virus's own extinction.
一种基于质量的蛋白质系统发育方法,称为 phylonumerics,用于使用专门构建的 MassTree 算法构建类似系统发育的树。这些树是从每个蛋白质的数值质量图谱数据集构建的,而无需基因或蛋白质序列。事实证明,这些树与传统的基于序列的树高度一致,为研究生物体的进化历史提供了可靠的方法。算法通过比较不同质量集中的肽对的质量差异来计算突变,并在树的分支节点上显示。根据定义,由于树显示代表表达蛋白的系统发育,因此所有突变都是非同义的。该算法输出这些突变的频率以及基于这些频率的总和的突变分数,该分数基于它们相对于树的根的位置进行加权。该算法还输出沿树的互连分支分隔的突变对列表。那些共同发生或连续发生,或接近连续发生,并且彼此之间的距离小于所有突变对的平均距离的突变对,被假定为上位性对。这些对被进一步检查,重点是非保守取代,因为它们在驱动结构和功能变化以及蛋白质和生物体进化方面的重要性。该方法的应用以 A 型人 H3N2 流感病毒的 H3 血凝素蛋白为例进行了演示。上位性对中最常见的祖先突变发生在抗原位点结构域内,而后代突变发生在其他抗原位点或蛋白质的其他部位。两者都在报道的糖基化位点占主导地位。该蛋白的结果进一步支持流感病毒的“小步骤”进化模型,其中涉及最小结构变化的非保守突变比涉及实质性变化的突变更受青睐,因为这可能会使病毒自身灭绝。