无监督机器学习在物种界定、综合分类学和生物多样性保护中的应用。

Unsupervised machine learning for species delimitation, integrative taxonomy, and biodiversity conservation.

机构信息

Department of Biological Sciences, The George Washington University, Washington, DC 20052 USA.

出版信息

Mol Phylogenet Evol. 2023 Dec;189:107939. doi: 10.1016/j.ympev.2023.107939. Epub 2023 Oct 5.

DOI:10.1016/j.ympev.2023.107939

Abstract

Integrative taxonomy, combining data from multiple axes of biologically relevant variation, is a major goal of systematics. Ideally, such taxonomies will derive from similarly integrative species-delimitation analyses. Yet, most current methods rely solely or primarily on molecular data, with other layers often incorporated only in a post hoc qualitative or comparative manner. A major limitation is the difficulty of devising quantitative parametric models linking different datasets in a unified ecological and evolutionary framework. Machine Learning (ML) methods offer flexibility in this arena by easily learning high-dimensional associations between observations (e.g., individual specimens) across a wide array of input features (e.g., genetics, geography, environment, and phenotype) to delimit statistically meaningful clusters. Here, I implement an unsupervised method using Self-Organizing (or "Kohonen") Maps (SOMs) for such purposes. Recent extensions called "SuperSOMs" can integrate multiple layers, each of which exerts independent influence on a two-dimensional output grid via empirically estimated weights. The grid cells are then delimited into K distinct units that can be interpreted as species or other entities. I show empirical examples in salamanders (Desmognathus) and snakes (Storeria) with layers representing alleles, space, climate, and traits. Simulations reveal that the SuperSOM approach can detect K = 1, tends not to over-split, reflects contributions from all layers, and limits large layers (e.g., genetic matrices) from overwhelming other datasets, desirable properties addressing major concerns from previous studies. Finally, I suggest that these and similar methods could integrate conservation-relevant layers such as population trends and human encroachment to delimit management units from an explicitly quantitative framework grounded in the ecology and evolution of species limits and boundaries.

摘要

整合分类学，将来自多个生物学相关变异轴的数据结合起来，是系统学的主要目标。理想情况下，这样的分类学将来自类似的整合物种界定分析。然而，目前大多数方法仅或主要依赖于分子数据，其他层次通常仅以事后定性或比较的方式纳入。一个主要的限制是设计将不同数据集链接到统一的生态和进化框架中的定量参数模型的困难。机器学习 (ML) 方法在这方面提供了灵活性，通过轻松学习观察结果（例如，个体标本）之间的高维关联，跨越广泛的输入特征（例如，遗传学、地理学、环境和表型）来界定具有统计学意义的聚类。在这里，我为实现这一目标实施了一种使用自组织（或“Kohonen”）映射（SOM）的无监督方法。最近的扩展称为“SuperSOMs”，可以集成多个层，每个层通过经验估计的权重对二维输出网格施加独立的影响。然后将网格单元划分为 K 个不同的单元，这些单元可以解释为物种或其他实体。我展示了在蝾螈（Desmognathus）和蛇（Storeria）中的经验示例，其中包含代表等位基因、空间、气候和特征的层。模拟表明，SuperSOM 方法可以检测到 K = 1，不易过度分裂，反映了所有层的贡献，并且限制了大层（例如，遗传矩阵）对其他数据集的压倒性影响，这是解决以前研究中主要问题的理想特性。最后，我建议这些和类似的方法可以整合与保护相关的层，例如种群趋势和人类侵占，以便从基于物种界限和边界的生态学和进化的明确定量框架中划定管理单元。

相似文献

Unsupervised machine learning for species delimitation, integrative taxonomy, and biodiversity conservation.

Mol Phylogenet Evol. 2023 Dec;189:107939. doi: 10.1016/j.ympev.2023.107939. Epub 2023 Oct 5.

Speciation Hypotheses from Phylogeographic Delimitation Yield an Integrative Taxonomy for Seal Salamanders (Desmognathus monticola).

Syst Biol. 2023 May 19;72(1):179-197. doi: 10.1093/sysbio/syac065.

Species delimitation 4.0: integrative taxonomy meets artificial intelligence.

Trends Ecol Evol. 2024 Aug;39(8):771-784. doi: 10.1016/j.tree.2023.11.002. Epub 2024 Jun 6.

A demonstration of unsupervised machine learning in species delimitation.

Mol Phylogenet Evol. 2019 Oct;139:106562. doi: 10.1016/j.ympev.2019.106562. Epub 2019 Jul 16.

Incorporating color into integrative taxonomy: analysis of the varied tit (Sittiparus varius) complex in East Asia.

Syst Biol. 2014 Jul;63(4):505-17. doi: 10.1093/sysbio/syu016. Epub 2014 Mar 6.

Integrative species delimitation reveals cryptic diversity in the southern Appalachian Antrodiaetus unicolor (Araneae: Antrodiaetidae) species complex.

Mol Ecol. 2020 Jun;29(12):2269-2287. doi: 10.1111/mec.15483. Epub 2020 Jun 17.

Genomic data reveal deep genetic structure but no support for current taxonomic designation in a grasshopper species complex.

Mol Ecol. 2019 Sep;28(17):3869-3886. doi: 10.1111/mec.15189. Epub 2019 Aug 29.

The Warps and Wefts of a Polyploidy Complex: Integrative Species Delimitation of the Diploid (Compositae, Anthemideae) Representatives.

Plants (Basel). 2022 Jul 19;11(14):1878. doi: 10.3390/plants11141878.

Species are hypotheses: avoid connectivity assessments based on pillars of sand.

Mol Ecol. 2015 Feb;24(3):525-44. doi: 10.1111/mec.13048. Epub 2015 Jan 19.

The effect of missing data on coalescent species delimitation and a taxonomic revision of whipsnakes (Colubridae: Masticophis).

Mol Phylogenet Evol. 2018 Oct;127:356-366. doi: 10.1016/j.ympev.2018.03.018. Epub 2018 Mar 20.

引用本文的文献

Conservation Evolutionary Biology: A Unified Framework Connecting the Past, Present, and Future of Biodiversity Conservation.

Mol Biol Evol. 2025 Jun 4;42(6). doi: 10.1093/molbev/msaf122.

Genetic diversity and agro-morphological characterization of cassava varieties provides insight for breeding and crop improvement.

Sci Rep. 2025 May 20;15(1):17498. doi: 10.1038/s41598-025-02527-5.

Phylogenomics of the rarest animals: a second species of Micrognathozoa identified by machine learning.

Proc Biol Sci. 2025 Feb;292(2041):20242867. doi: 10.1098/rspb.2024.2867. Epub 2025 Feb 19.

The genetic origins of species boundaries at subtropical and temperate ecoregions in the North American racers (Coluber constrictor).

Heredity (Edinb). 2025 Feb;134(2):87-97. doi: 10.1038/s41437-024-00737-7. Epub 2024 Nov 28.

Understanding species limits through the formation of phylogeographic lineages.

Ecol Evol. 2024 Oct 2;14(10):e70263. doi: 10.1002/ece3.70263. eCollection 2024 Oct.

Morphological Species Delimitation in The Western Pond Turtle (): Can Machine Learning Methods Aid in Cryptic Species Identification?

Integr Org Biol. 2024 Apr 2;6(1):obae010. doi: 10.1093/iob/obae010. eCollection 2024.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

无监督机器学习在物种界定、综合分类学和生物多样性保护中的应用。

Unsupervised machine learning for species delimitation, integrative taxonomy, and biodiversity conservation.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献