Noah Katherine E, Hao Jiasheng, Li Luyan, Sun Xiaoyan, Foley Brian, Yang Qun, Xia Xuhua
Department of Biology, University of Ottawa, Ottawa, ON, Canada.
College of Life Sciences, Anhui Normal University, Wuhu, China.
Evol Bioinform Online. 2020 Feb 5;16:1176934320903735. doi: 10.1177/1176934320903735. eCollection 2020.
Deep phylogeny involving arthropod lineages is difficult to recover because the erosion of phylogenetic signals over time leads to unreliable multiple sequence alignment (MSA) and subsequent phylogenetic reconstruction. One way to alleviate the problem is to assemble a large number of gene sequences to compensate for the weakness in each individual gene. Such an approach has led to many robustly supported but contradictory phylogenies. A close examination shows that the supermatrix approach often suffers from two shortcomings. The first is that MSA is rarely checked for reliability and, as will be illustrated, can be poor. The second is that, to alleviate the problem of homoplasy at the third codon position of protein-coding genes due to convergent evolution of nucleotide frequencies, phylogeneticists may remove or degenerate the third codon position but may do it improperly and introduce new biases. We performed extensive reanalysis of one of such "big data" sets to highlight these two problems, and demonstrated the power and benefits of correcting or alleviating these problems. Our results support a new group with Xiphosura and Arachnopulmonata (Tetrapulmonata + Scorpiones) as sister taxa. This favors a new hypothesis in which the ancestor of Xiphosura and the extinct Eurypterida (sea scorpions, of which many later forms lived in brackish or freshwater) returned to the sea after the initial chelicerate invasion of land. Our phylogeny is supported even with the original data but processed with a new "principled" codon degeneration. We also show that removing the 1673 codon sites with both AGN and UCN codons (encoding serine) in our alignment can partially reconcile discrepancies between nucleotide-based and AA-based tree, partly because two sequences, one with AGN and the other with UCN, would be identical at the amino acid level but quite different at the nucleotide level.
涉及节肢动物谱系的深层系统发育关系很难确定,因为随着时间的推移,系统发育信号的侵蚀会导致不可靠的多序列比对(MSA)以及随后的系统发育重建。缓解这一问题的一种方法是组装大量基因序列,以弥补每个单个基因的不足。这种方法已经产生了许多得到有力支持但相互矛盾的系统发育树。仔细研究表明,超级矩阵方法通常存在两个缺点。第一个缺点是很少检查MSA的可靠性,而且正如将要说明的,其质量可能很差。第二个缺点是,为了缓解由于核苷酸频率的趋同进化导致的蛋白质编码基因第三密码子位置的同塑性问题,系统发育学家可能会去除或简化第三密码子位置,但可能操作不当并引入新的偏差。我们对其中一个这样的“大数据”集进行了广泛的重新分析,以突出这两个问题,并展示了纠正或缓解这些问题的作用和益处。我们的结果支持一个新的类群,其中剑尾目和蛛形肺类(四肺类+蝎目)为姐妹分类单元。这支持了一个新的假说,即剑尾目的祖先和已灭绝的广翅鲎目(海蝎,其中许多后来的种类生活在微咸水或淡水中)在螯肢动物最初侵入陆地后又回到了海洋。即使使用原始数据,但经过新的“有原则的”密码子简化处理后,我们的系统发育树也得到了支持。我们还表明,在我们的比对中去除同时具有AGN和UCN密码子(编码丝氨酸)的1673个密码子位点,可以部分调和基于核苷酸的树和基于氨基酸的树之间的差异,部分原因是两个序列,一个具有AGN,另一个具有UCN,在氨基酸水平上是相同的,但在核苷酸水平上却有很大差异。