准确、自动化的基因库访问序列分类学分配：一种新方法，使用来自 10000 个辣椒属物种访问序列的高通量标记数据进行演示。

Accurate, automated taxonomic assignment of genebank accessions: a new method demonstrated using high-throughput marker data from 10,000 Capsicum spp. accessions.

机构信息

Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Saxony-Anhalt, Germany.

The University of Melbourne, Melbourne, Australia.

出版信息

Theor Appl Genet. 2023 Sep 11;136(10):208. doi: 10.1007/s00122-023-04441-8.

DOI:10.1007/s00122-023-04441-8

PMID:37695370

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10495273/

Abstract

We demonstrate how an algorithm that uses cheap genetic marker data can ensure the taxonomic assignments of genebank samples are complete, intuitive, and consistent-which enhances their value. To maximise the benefit of genebank resources, accurate and complete taxonomic assignments are imperative. The rise of genebank genomics allows genetic methods to be used to ensure this, but these need to be largely automated since the number of samples dealt with is too great for efficient manual recategorisation, however no clearly optimal method has yet arisen. A recent landmark genebank genomic study sequenced over 10,000 genebank accessions of peppers (Capsicum spp.), a species of great commercial, cultural, and scientific importance, which suffers from much taxonomic ambiguity. Similar datasets will, in coming decades, be produced for hundreds of plant taxa, affording a perfect opportunity to develop automated taxonomic correction methods in advance of the incipient genebank genomics explosion, alongside providing insights into pepper taxonomy in general. We present a marker-based taxonomic assignment approach that combines ideas from several standard classification algorithms, resulting in a highly flexible and customisable classifier suitable to impose intuitive assignments, even in highly reticulated species groups with complex population structures and evolutionary histories. Our classifier performs favourably compared with key alternative methods. Possible sensible alterations to pepper taxonomy based on the results are proposed for discussion by the relevant communities.

摘要

我们展示了一种算法，该算法使用廉价的遗传标记数据，可以确保基因库样本的分类分配完整、直观且一致，从而提高其价值。为了最大限度地发挥基因库资源的效益，准确和完整的分类分配至关重要。基因库基因组学的兴起使得可以使用遗传方法来确保这一点，但由于要处理的样本数量太大，无法进行有效的手动重新分类，因此这些方法需要在很大程度上实现自动化，但是还没有出现明显的最佳方法。最近的一项具有里程碑意义的基因库基因组学研究对超过 10000 个辣椒（Capsicum spp.）基因库样本进行了测序，辣椒是一种具有重要商业、文化和科学价值的物种，但存在很多分类上的模糊性。在未来几十年中，类似的数据集将为数百个植物分类群生成，这为在基因库基因组学爆炸之前提前开发自动化分类校正方法提供了绝佳机会，同时也为一般的辣椒分类学提供了深入了解。我们提出了一种基于标记的分类分配方法，该方法结合了几种标准分类算法的思想，从而形成了一个高度灵活和可定制的分类器，即使在具有复杂种群结构和进化历史的高度网状物种群中，也可以进行直观的分配。与关键的替代方法相比，我们的分类器表现良好。根据结果提出了一些对辣椒分类学的合理修改建议，以供相关社区讨论。