Department of Biology, University of Hawai'i, 2538 McCarthy Mall, Edmondson Hall 216, Honolulu, HI 96822, USA.
Department of Biological Sciences and Museum of Natural Science, Louisiana State University, 202 Life Sciences Building, Baton Rouge, LA 70803, USA.
Syst Biol. 2018 Mar 1;67(2):269-284. doi: 10.1093/sysbio/syx073.
The use of genetic data for identifying species-level lineages across the tree of life has received increasing attention in the field of systematics over the past decade. The multispecies coalescent model provides a framework for understanding the process of lineage divergence and has become widely adopted for delimiting species. However, because these studies lack an explicit assessment of model fit, in many cases, the accuracy of the inferred species boundaries are unknown. This is concerning given the large amount of empirical data and theory that highlight the complexity of the speciation process. Here, we seek to fill this gap by using simulation to characterize the sensitivity of inference under the multispecies coalescent (MSC) to several violations of model assumptions thought to be common in empirical data. We also assess the fit of the MSC model to empirical data in the context of species delimitation. Our results show substantial variation in model fit across data sets. Posterior predictive tests find the poorest model performance in data sets that were hypothesized to be impacted by model violations. We also show that while the inferences assuming the MSC are robust to minor model violations, such inferences can be biased under some biologically plausible scenarios. Taken together, these results suggest that researchers can identify individual data sets in which species delimitation under the MSC is likely to be problematic, thereby highlighting the cases where additional lines of evidence to identify species boundaries are particularly important to collect. Our study supports a growing body of work highlighting the importance of model checking in phylogenetics, and the usefulness of tailoring tests of model fit to assess the reliability of particular inferences. [Populations structure, gene flow, demographic changes, posterior prediction, simulation, genetics.].
在过去十年中,系统学领域越来越关注使用遗传数据来识别生命之树上的物种水平谱系。多物种合并模型为理解谱系分歧过程提供了一个框架,并已广泛应用于物种划分。然而,由于这些研究缺乏对模型拟合的明确评估,在许多情况下,推断的物种边界的准确性是未知的。鉴于大量的经验数据和理论强调了物种形成过程的复杂性,这令人担忧。在这里,我们通过模拟来描述多物种合并模型(MSC)对几种被认为在经验数据中常见的模型假设违反的敏感性,试图填补这一空白。我们还评估了 MSC 模型在物种划分背景下对经验数据的拟合程度。我们的结果表明,模型拟合在数据集之间存在很大差异。后验预测测试发现,在假设受模型违反影响的数据集中,模型表现最差。我们还表明,虽然假设 MSC 的推断对小的模型违反是稳健的,但在某些生物学上合理的情况下,这些推断可能会有偏差。总之,这些结果表明,研究人员可以识别出 MSC 下物种划分可能存在问题的个别数据集,从而突出了收集识别物种边界的额外证据特别重要的情况。我们的研究支持越来越多的工作,强调了在系统发育学中进行模型检查的重要性,以及针对特定推断评估可靠性的定制模型拟合测试的有用性。[种群结构、基因流动、人口变化、后验预测、模拟、遗传学]。