National Institute for Mathematical and Biological Synthesis, University of Tennessee, Knoxville, TN 37996, USA.
Department of Biological Sciences, Virginia Tech, 4076 Derring Hall, 926 West Campus Drive, Blacksburg, VA 24061, USA.
Syst Biol. 2019 Sep 1;68(5):698-716. doi: 10.1093/sysbio/syz005.
Modeling discrete phenotypic traits for either ancestral character state reconstruction or morphology-based phylogenetic inference suffers from ambiguities of character coding, homology assessment, dependencies, and selection of adequate models. These drawbacks occur because trait evolution is driven by two key processes-hierarchical and hidden-which are not accommodated simultaneously by the available phylogenetic methods. The hierarchical process refers to the dependencies between anatomical body parts, while the hidden process refers to the evolution of gene regulatory networks (GRNs) underlying trait development. Herein, I demonstrate that these processes can be efficiently modeled using structured Markov models (SMM) equipped with hidden states, which resolves the majority of the problems associated with discrete traits. Integration of SMM with anatomy ontologies can adequately incorporate the hierarchical dependencies, while the use of the hidden states accommodates hidden evolution of GRNs and substitution rate heterogeneity. I assess the new models using simulations and theoretical synthesis. The new approach solves the long-standing "tail color problem," in which the trait is scored for species with tails of different colors or no tails. It also presents a previously unknown issue called the "two-scientist paradox," in which the nature of coding the trait and the hidden processes driving the trait's evolution are confounded; failing to account for the hidden process may result in a bias, which can be avoided by using hidden state models. All this provides a clear guideline for coding traits into characters. This article gives practical examples of using the new framework for phylogenetic inference and comparative analysis.
对祖先性状重建或基于形态的系统发育推断进行离散表型性状建模存在字符编码、同源性评估、依赖性和合适模型选择等方面的模糊性。这些缺点的出现是因为性状进化是由两个关键过程驱动的——层次和隐藏——这两个过程不能同时被现有的系统发育方法所容纳。层次过程是指解剖结构的身体部位之间的依赖性,而隐藏过程是指性状发育的基因调控网络(GRN)的进化。本文中,我证明了这些过程可以通过使用带有隐藏状态的结构马尔可夫模型(SMM)有效地建模,从而解决了与离散性状相关的大多数问题。将 SMM 与解剖学本体集成可以充分包含层次依赖性,而隐藏状态的使用则可以适应 GRN 的隐藏进化和替代率异质性。我使用模拟和理论综合来评估新模型。新方法解决了长期存在的“尾巴颜色问题”,即对有不同颜色或没有尾巴的物种进行性状评分。它还提出了一个以前未知的问题,称为“两位科学家悖论”,其中编码性状和驱动性状进化的隐藏过程的性质是混淆的;如果不考虑隐藏过程,可能会导致偏差,而使用隐藏状态模型可以避免这种偏差。所有这些都为将性状编码为特征提供了明确的指导方针。本文提供了使用新框架进行系统发育推断和比较分析的实际示例。