Departamento de Biologia, Universidade Federal de São Carlos, Sorocaba, Brazil.
Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Carlos, Brazil.
Mol Ecol Resour. 2022 Apr;22(3):1016-1028. doi: 10.1111/1755-0998.13534. Epub 2021 Oct 31.
Delimiting species boundaries is a major goal in evolutionary biology. An increasing volume of literature has focused on the challenges of investigating cryptic diversity within complex evolutionary scenarios of speciation, including gene flow and demographic fluctuations. New methods based on model selection, such as approximate Bayesian computation, approximate likelihoods, and machine learning are promising tools arising in this field. Here, we introduce a framework for species delimitation using the multispecies coalescent model coupled with a deep learning algorithm based on convolutional neural networks (CNNs). We compared this strategy with a similar ABC approach. We applied both methods to test species boundary hypotheses based on current and previous taxonomic delimitations as well as genetic data (sequences from 41 loci) in Pilosocereus aurisetus, a cactus species complex with a sky-island distribution and taxonomic uncertainty. To validate our method, we also applied the same strategy on data from widely accepted species from the genus Drosophila. The results show that our CNN approach has a high capacity to distinguish among the simulated species delimitation scenarios, with higher accuracy than ABC. For the cactus data set, a splitter hypothesis without gene flow showed the highest probability in both CNN and ABC approaches, a result agreeing with previous taxonomic classifications and in line with the sky-island distribution and low dispersal of P. aurisetus. Our results highlight the cryptic diversity within the P. aurisetus complex and show that CNNs are a promising approach for distinguishing complex evolutionary histories, even outperforming the accuracy of other model-based approaches such as ABC.
界定物种界限是进化生物学的主要目标。越来越多的文献集中在研究物种形成中复杂进化情景下的隐种多样性的挑战上,包括基因流和种群波动。基于模型选择的新方法,如近似贝叶斯计算、近似似然和机器学习,是该领域出现的有前途的工具。在这里,我们引入了一种使用多物种合并模型结合基于卷积神经网络(CNN)的深度学习算法进行物种界定的框架。我们将这种策略与类似的 ABC 方法进行了比较。我们应用这两种方法来检验物种界限假设,这些假设基于当前和以前的分类学界限以及遗传数据(来自 41 个位点的序列),这些数据来自 Pilosocereus aurisetus,这是一个具有天空岛屿分布和分类不确定性的仙人掌物种复合体。为了验证我们的方法,我们还将相同的策略应用于来自广泛接受的果蝇属物种的数据。结果表明,我们的 CNN 方法具有区分模拟物种界限情景的高能力,准确性高于 ABC。对于仙人掌数据集,在 CNN 和 ABC 方法中,没有基因流的分裂假设显示出最高的概率,这一结果与以前的分类学分类一致,与天空岛屿分布和 P. aurisetus 的低扩散性一致。我们的结果突出了 P. aurisetus 复合体内部的隐种多样性,并表明 CNN 是区分复杂进化历史的一种很有前途的方法,甚至优于 ABC 等其他基于模型的方法的准确性。