Folk Ryan A, Kates Heather R, LaFrance Raphael, Soltis Douglas E, Soltis Pamela S, Guralnick Robert P
Department of Biological Sciences Mississippi State University Mississippi State Mississippi USA.
Florida Museum of Natural History University of Florida Gainesville Florida USA.
Appl Plant Sci. 2021 Feb 27;9(2):e11410. doi: 10.1002/aps3.11410. eCollection 2021 Feb.
Large phylogenetic data sets have often been restricted to small numbers of loci from GenBank, and a vetted sampling-to-sequencing phylogenomic protocol scaling to thousands of species is not yet available. Here, we report a high-throughput collections-based approach that empowers researchers to explore more branches of the tree of life with numerous loci.
We developed an integrated Specimen-to-Laboratory Information Management System (SLIMS), connecting sampling and wet lab efforts with progress tracking at each stage. Using unique identifiers encoded in QR codes and a taxonomic database, a research team can sample herbarium specimens, efficiently record the sampling event, and capture specimen images. After sampling in herbaria, images are uploaded to a citizen science platform for metadata generation, and tissue samples are moved through a simple, high-throughput, plate-based herbarium DNA extraction and sequencing protocol.
We applied this sampling-to-sequencing workflow to ~15,000 species, producing for the first time a data set with ~50% taxonomic representation of the "nitrogen-fixing clade" of angiosperms.
The approach we present is appropriate at any taxonomic scale and is extensible to other collection types. The widespread use of large-scale sampling strategies repositions herbaria as accessible but largely untapped resources for broad taxonomic sampling with thousands of species.
大型系统发育数据集通常局限于来自GenBank的少数基因座,目前还没有一种经过审查的、可扩展到数千个物种的从采样到测序的系统发育基因组学方案。在此,我们报告了一种基于高通量样本收集的方法,使研究人员能够利用众多基因座探索生命之树的更多分支。
我们开发了一个集成的标本到实验室信息管理系统(SLIMS),将采样和湿实验室工作与每个阶段的进展跟踪联系起来。通过使用二维码编码的唯一标识符和分类数据库,研究团队可以对标本馆标本进行采样,有效地记录采样事件,并采集标本图像。在标本馆采样后,图像被上传到一个公民科学平台以生成元数据,组织样本则通过一个简单的、基于平板的高通量标本馆DNA提取和测序方案进行处理。
我们将这种从采样到测序的工作流程应用于约15,000个物种,首次产生了一个数据集,其分类代表性约为被子植物“固氮分支”的50%。
我们提出的方法适用于任何分类规模,并且可以扩展到其他样本类型。大规模采样策略的广泛应用将标本馆重新定位为可获取但在很大程度上未被利用的资源,可用于对数千个物种进行广泛的分类采样。