Department of Biological Sciences, The George Washington University, Washington, DC 20052 USA.
Mol Phylogenet Evol. 2023 Dec;189:107939. doi: 10.1016/j.ympev.2023.107939. Epub 2023 Oct 5.
Integrative taxonomy, combining data from multiple axes of biologically relevant variation, is a major goal of systematics. Ideally, such taxonomies will derive from similarly integrative species-delimitation analyses. Yet, most current methods rely solely or primarily on molecular data, with other layers often incorporated only in a post hoc qualitative or comparative manner. A major limitation is the difficulty of devising quantitative parametric models linking different datasets in a unified ecological and evolutionary framework. Machine Learning (ML) methods offer flexibility in this arena by easily learning high-dimensional associations between observations (e.g., individual specimens) across a wide array of input features (e.g., genetics, geography, environment, and phenotype) to delimit statistically meaningful clusters. Here, I implement an unsupervised method using Self-Organizing (or "Kohonen") Maps (SOMs) for such purposes. Recent extensions called "SuperSOMs" can integrate multiple layers, each of which exerts independent influence on a two-dimensional output grid via empirically estimated weights. The grid cells are then delimited into K distinct units that can be interpreted as species or other entities. I show empirical examples in salamanders (Desmognathus) and snakes (Storeria) with layers representing alleles, space, climate, and traits. Simulations reveal that the SuperSOM approach can detect K = 1, tends not to over-split, reflects contributions from all layers, and limits large layers (e.g., genetic matrices) from overwhelming other datasets, desirable properties addressing major concerns from previous studies. Finally, I suggest that these and similar methods could integrate conservation-relevant layers such as population trends and human encroachment to delimit management units from an explicitly quantitative framework grounded in the ecology and evolution of species limits and boundaries.
整合分类学,将来自多个生物学相关变异轴的数据结合起来,是系统学的主要目标。理想情况下,这样的分类学将来自类似的整合物种界定分析。然而,目前大多数方法仅或主要依赖于分子数据,其他层次通常仅以事后定性或比较的方式纳入。一个主要的限制是设计将不同数据集链接到统一的生态和进化框架中的定量参数模型的困难。机器学习 (ML) 方法在这方面提供了灵活性,通过轻松学习观察结果(例如,个体标本)之间的高维关联,跨越广泛的输入特征(例如,遗传学、地理学、环境和表型)来界定具有统计学意义的聚类。在这里,我为实现这一目标实施了一种使用自组织(或“Kohonen”)映射(SOM)的无监督方法。最近的扩展称为“SuperSOMs”,可以集成多个层,每个层通过经验估计的权重对二维输出网格施加独立的影响。然后将网格单元划分为 K 个不同的单元,这些单元可以解释为物种或其他实体。我展示了在蝾螈(Desmognathus)和蛇(Storeria)中的经验示例,其中包含代表等位基因、空间、气候和特征的层。模拟表明,SuperSOM 方法可以检测到 K = 1,不易过度分裂,反映了所有层的贡献,并且限制了大层(例如,遗传矩阵)对其他数据集的压倒性影响,这是解决以前研究中主要问题的理想特性。最后,我建议这些和类似的方法可以整合与保护相关的层,例如种群趋势和人类侵占,以便从基于物种界限和边界的生态学和进化的明确定量框架中划定管理单元。