Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, United States.
Department of Molecular Genetics and Cell Biology, University of Chicago, Chicago, IL, 60637, United States.
Sci Data. 2019 Jul 3;6(1):104. doi: 10.1038/s41597-019-0049-y.
Offspring size is a fundamental trait in disparate biological fields of study. This trait can be measured as the size of plant seeds, animal eggs, or live young, and it influences ecological interactions, organism fitness, maternal investment, and embryonic development. Although multiple evolutionary processes have been predicted to drive the evolution of offspring size, the phylogenetic distribution of this trait remains poorly understood, due to the difficulty of reliably collecting and comparing offspring size data from many species. Here we present a dataset of 10,449 morphological descriptions of insect eggs, with records for 6,706 unique insect species and representatives from every extant hexapod order. The dataset includes eggs whose volumes span more than eight orders of magnitude. We created this dataset by partially automating the extraction of egg traits from the primary literature. In the process, we overcame challenges associated with large-scale phenotyping by designing and employing custom bioinformatic solutions to common problems. We matched the taxa in this dataset to the currently accepted scientific names in taxonomic and genetic databases, which will facilitate the use of these data for testing pressing evolutionary hypotheses in offspring size evolution.
后代大小是不同生物学研究领域的一个基本特征。这个特征可以衡量为植物种子、动物卵或活体幼仔的大小,它影响生态相互作用、生物适应性、母体投资和胚胎发育。尽管已经预测了多种进化过程来驱动后代大小的进化,但由于难以从许多物种中可靠地收集和比较后代大小数据,因此该特征的系统发育分布仍未得到很好的理解。 在这里,我们提供了一个包含 10449 个昆虫卵形态描述的数据集,其中包括 6706 个独特昆虫物种的记录,以及每个现存六足纲的代表。该数据集包括体积跨越八个以上数量级的卵。我们通过部分自动化从主要文献中提取卵特征来创建此数据集。在这个过程中,我们通过设计和使用定制的生物信息学解决方案来解决常见问题,克服了大规模表型分析相关的挑战。我们将该数据集中的分类单元与分类学和遗传数据库中当前接受的科学名称进行匹配,这将有助于利用这些数据来检验后代大小进化方面的紧迫进化假设。