Sir William Dunn School of Pathology, University of Oxford, South Parks Road, Oxford OX1 3RE, UK.
Proc Biol Sci. 2011 Apr 7;278(1708):1009-18. doi: 10.1098/rspb.2010.1427. Epub 2010 Sep 29.
We have developed a machine-learning approach to identify 3537 discrete orthologue protein sequence groups distributed across all available archaeal genomes. We show that treating these orthologue groups as binary detection/non-detection data is sufficient to capture the majority of archaeal phylogeny. We subsequently use the sequence data from these groups to infer a method and substitution-model-independent phylogeny. By holding this phylogeny constrained and interrogating the intersection of this large dataset with both the Eukarya and the Bacteria using Bayesian and maximum-likelihood approaches, we propose and provide evidence for a methanogenic origin of the Archaea. By the same criteria, we also provide evidence in support of an origin for Eukarya either within or as sisters to the Thaumarchaea.
我们开发了一种机器学习方法,用于识别分布在所有可用古菌基因组中的 3537 个离散的直系同源蛋白序列群。我们表明,将这些直系同源物组视为二进制检测/非检测数据足以捕获大多数古菌的系统发育。随后,我们使用这些组的序列数据来推断一种方法和替代模型独立的系统发育。通过固定这个系统发育,并使用贝叶斯和最大似然方法,用这个大数据集的交集来询问真核生物和细菌,我们提出并提供了古菌起源于产甲烷菌的证据。同样的标准,我们也提供了支持真核生物起源于奇古菌门内部或与其姐妹门的证据。