Suppr超能文献

利用自然历史指导基于遗传数据的隐秘物种界定的监督式机器学习。

Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data.

作者信息

Derkarabetian Shahan, Starrett James, Hedin Marshal

机构信息

Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, 26 Oxford St., Cambridge, MA, 02138, USA.

Department of Entomology and Nematology, University of California, Davis, Briggs Hall, Davis, CA, 95616-5270, USA.

出版信息

Front Zool. 2022 Feb 22;19(1):8. doi: 10.1186/s12983-022-00453-0.

Abstract

The diversity of biological and ecological characteristics of organisms, and the underlying genetic patterns and processes of speciation, makes the development of universally applicable genetic species delimitation methods challenging. Many approaches, like those incorporating the multispecies coalescent, sometimes delimit populations and overestimate species numbers. This issue is exacerbated in taxa with inherently high population structure due to low dispersal ability, and in cryptic species resulting from nonecological speciation. These taxa present a conundrum when delimiting species: analyses rely heavily, if not entirely, on genetic data which over split species, while other lines of evidence lump. We showcase this conundrum in the harvester Theromaster brunneus, a low dispersal taxon with a wide geographic distribution and high potential for cryptic species. Integrating morphology, mitochondrial, and sub-genomic (double-digest RADSeq and ultraconserved elements) data, we find high discordance across analyses and data types in the number of inferred species, with further evidence that multispecies coalescent approaches over split. We demonstrate the power of a supervised machine learning approach in effectively delimiting cryptic species by creating a "custom" training data set derived from a well-studied lineage with similar biological characteristics as Theromaster. This novel approach uses known taxa with particular biological characteristics to inform unknown taxa with similar characteristics, using modern computational tools ideally suited for species delimitation. The approach also considers the natural history of organisms to make more biologically informed species delimitation decisions, and in principle is broadly applicable for taxa across the tree of life.

摘要

生物体生物学和生态学特征的多样性,以及物种形成背后的遗传模式和过程,使得开发普遍适用的遗传物种界定方法具有挑战性。许多方法,如那些纳入多物种溯祖模型的方法,有时会界定种群并高估物种数量。在由于扩散能力低而固有地具有高种群结构的分类群中,以及在非生态物种形成导致的隐性物种中,这个问题会更加严重。在界定物种时,这些分类群带来了一个难题:分析如果不是完全依赖于会过度划分物种的遗传数据,也是严重依赖,而其他证据线索则会将物种合并。我们在收割蚁Theromaster brunneus中展示了这个难题,它是一个扩散能力低的分类群,地理分布广泛,隐性物种形成潜力高。整合形态学、线粒体和亚基因组(双酶切RADSeq和超保守元件)数据,我们发现在推断的物种数量上,各分析和数据类型之间存在高度不一致,并有进一步证据表明多物种溯祖方法过度划分了物种。我们通过创建一个从一个研究充分的谱系衍生而来的“定制”训练数据集,展示了一种监督机器学习方法在有效界定隐性物种方面的能力,该谱系具有与Theromaster相似的生物学特征。这种新方法使用具有特定生物学特征的已知分类群来为具有相似特征的未知分类群提供信息,使用非常适合物种界定的现代计算工具。该方法还考虑了生物体的自然历史,以做出更具生物学依据的物种界定决策,并且原则上广泛适用于生命之树中的各个分类群。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1dd/8862334/2e8a81d02100/12983_2022_453_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验