Department of Integrative Biology, University of California, 3040 Valley Life Sciences Building, Berkeley, CA 94720-3140, USA
University of California Museum of Paleontology, University of California, 1101 Valley Life Sciences Building, Berkeley, CA 94720-4780, USA.
Biol Lett. 2018 Sep;14(9). doi: 10.1098/rsbl.2018.0431.
Large-scale analysis of the fossil record requires aggregation of palaeontological data from individual fossil localities. Prior to computers, these synoptic datasets were compiled by hand, a laborious undertaking that took years of effort and forced palaeontologists to make difficult choices about what types of data to tabulate. The advent of desktop computers ushered in palaeontology's first digital revolution-online literature-based databases, such as the Paleobiology Database (PBDB). However, the published literature represents only a small proportion of the palaeontological data housed in museum collections. Although this issue has long been appreciated, the magnitude, and thus potential significance, of these so-called 'dark data' has been difficult to determine. Here, in the early phases of a second digital revolution in palaeontology--the digitization of museum collections-we provide an estimate of the magnitude of palaeontology's dark data. Digitization of our nine institutions' holdings of Cenozoic marine invertebrate collections from California, Oregon and Washington in the USA reveals that they represent 23 times the number of unique localities than are currently available in the PBDB. These data, and the vast quantity of similarly untapped dark data in other museum collections, will, when digitally mobilized, enhance palaeontologists' ability to make inferences about the patterns and processes of past evolutionary and ecological changes.
大规模的化石记录分析需要整合来自各个化石地点的古生物学数据。在计算机出现之前,这些综合数据集是通过手工编制的,这是一项艰苦的工作,需要花费多年的努力,并迫使古生物学家在要制表的数据类型方面做出艰难的选择。台式计算机的出现开创了古生物学的第一次数字革命——基于在线文献的数据库,例如古生物学数据库(PBDB)。然而,已发表的文献仅代表博物馆藏品中保存的古生物学数据的一小部分。尽管这个问题由来已久,但这些所谓的“暗数据”的规模及其潜在意义一直难以确定。在这里,在古生物学的第二次数字革命(博物馆藏品的数字化)的早期阶段,我们对古生物学暗数据的规模进行了估计。对我们九个机构在美国加利福尼亚州、俄勒冈州和华盛顿州的新生代海洋无脊椎动物藏品的数字化揭示了,这些藏品所代表的独特地点数量是目前 PBDB 中可用数据的 23 倍。这些数据以及其他博物馆藏品中大量类似的未开发的暗数据,一旦数字化,将增强古生物学家对过去进化和生态变化模式和过程进行推断的能力。