J. F. Blumenbach Institute of Zoology and Anthropology, University of Göttingen, Göttingen, Germany.
Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, China.
Mol Ecol Resour. 2020 May;20(3). doi: 10.1111/1755-0998.13146. Epub 2020 Mar 4.
Genomic data sets are increasingly central to ecological and evolutionary biology, but far fewer resources are available for invertebrates. Powerful new computational tools and the rapidly decreasing cost of Illumina sequencing are beginning to change this, enabling rapid genome assembly and reference marker extraction. We have developed and tested a practical workflow for developing genomic resources in nonmodel groups with real-world data on Collembola (springtails), one of the most dominant soil animals on Earth. We designed universal molecular marker sets, single-copy orthologues (BUSCOs) and ultraconserved elements (UCEs), using three existing and 11 newly generated genomes. Both marker types were tested in silico via marker capture success and phylogenetic performance. The new genomes were assembled with Illumina short reads and 9,585-14,743 protein-coding genes were predicted with ab initio and protein homology evidence. We identified 1,997 benchmarking universal single-copy orthologues (BUSCOs) across 14 genomes and created and assessed a custom BUSCO data set for extracting single-copy genes. We also developed a new UCE probe set containing 46,087 baits targeting 1,885 loci. We successfully captured 1,437-1,865 BUSCOs and 975-1,186 UCEs across 14 genomes. Phylogenomic reconstructions using these markers proved robust, giving new insight on deep-time collembolan relationships. Our study demonstrates the feasibility of generating thousands of universal markers from highly efficient whole-genome sequencing, providing a valuable resource for genome-scale investigations in evolutionary biology and ecology.
基因组数据集在生态学和进化生物学中越来越重要,但用于无脊椎动物的资源要少得多。强大的新计算工具和 Illumina 测序成本的迅速下降开始改变这种状况,使快速基因组组装和参考标记提取成为可能。我们使用关于弹尾目(跳虫)的真实世界数据,开发并测试了一种在非模式生物中开发基因组资源的实用工作流程,弹尾目是地球上最主要的土壤动物之一。我们使用现有的三个和新生成的 11 个基因组设计了通用分子标记物集、单拷贝直系同源物(BUSCOs)和超保守元件(UCEs)。通过标记捕获成功率和系统发育性能对这两种标记物类型进行了计算机测试。使用 Illumina 短读对新基因组进行了组装,并使用从头预测和蛋白质同源证据预测了 9,585-14,743 个蛋白质编码基因。我们在 14 个基因组中鉴定了 1,997 个基准通用单拷贝直系同源物(BUSCOs),并创建和评估了用于提取单拷贝基因的定制 BUSCO 数据集。我们还开发了一个新的 UCE 探针集,包含 46,087 个针对 1,885 个位点的探针。我们成功捕获了 14 个基因组中的 1,437-1,865 个 BUSCOs 和 975-1,186 个 UCEs。使用这些标记进行的系统发育重建证明是稳健的,为深入了解弹尾目关系提供了新的见解。我们的研究表明,从高效的全基因组测序中生成数千个通用标记是可行的,为进化生物学和生态学中的基因组规模研究提供了有价值的资源。