Guralnick Robert P, Zermoglio Paula F, Wieczorek John, LaFrance Raphael, Bloom David, Russell Laura
University of Florida Museum of Natural History University of Florida at Gainesville, Gainesville, FL, USA
Departamento de Ecología, Genética y Evolución, Instituto IEGEBA (CONICET-UBA), Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina.
Database (Oxford). 2016 Dec 26;2016. doi: 10.1093/database/baw158. Print 2016.
For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species databases, which only provide species-level, aggregated value ranges for traits of interest and often lack the direct observations on which those ranges are based. Perhaps under-appreciated is that digitized biocollection records collectively contain a vast trove of trait data measured directly from individuals, but this content remains hidden and highly heterogeneous, impeding discoverability and use. We developed and deployed a suite of openly accessible software tools in order to collate a full set of trait descriptions and extract two key traits, body length and mass, from >18 million specimen records in VertNet, a global biodiversity data publisher and aggregator. We tested success rate of these tools against hand-checked validation data sets and characterized quality and quantity. A post-processing toolkit was developed to standardize and harmonize data sets, and to integrate this improved content into VertNet for broadest reuse. The result of this work was to add more than 1.5 million harmonized measurements on vertebrate body mass and length directly to specimen records. Rates of false positives and negatives for extracted data were extremely low. We also created new tools for filtering, querying, and assembling this research-ready vertebrate trait content for view and download. Our work has yielded a novel database and platform for harmonized trait content that will grow as tools introduced here become part of publication workflows. We close by noting how this effort extends to new communities already developing similar digitized content.Database URL: http://portal.vertnet.org/search?advanced=1.
在地球上的广大区域以及生命之树的大部分分支中,用于了解性状多样性的数据并不完整。然而,这些性状数据一旦完全汇集起来,便构成了生物体进化史、它们组成群落的方式以及生态系统的性质和功能之间的联系。近期为填补数据空白所做的努力主要集中在整理按物种分类的性状数据库,这些数据库仅提供感兴趣性状的物种层面的汇总值域,而且往往缺乏这些值域所依据的直接观测数据。或许未得到充分重视的是,数字化生物标本记录集合中集体包含了大量直接从个体测量得到的性状数据,但这些内容仍然隐藏且高度异质,阻碍了发现和利用。我们开发并部署了一套可公开访问的软件工具,以便整理出一套完整的性状描述,并从全球生物多样性数据发布和聚合平台VertNet中的1800多万条标本记录中提取两个关键性状——体长和体重。我们对照人工检查的验证数据集测试了这些工具的成功率,并对质量和数量进行了表征。开发了一个后处理工具包,用于标准化和协调数据集,并将这些改进后的内容整合到VertNet中以便最广泛地重复使用。这项工作的结果是直接在标本记录中增加了超过150万个关于脊椎动物体重和体长的统一测量值。提取数据的假阳性和假阴性率极低。我们还创建了新工具,用于筛选、查询和整理这些可供研究使用的脊椎动物性状内容以供查看和下载。我们的工作产生了一个用于统一性状内容的新颖数据库和平台,随着此处介绍的工具成为出版工作流程的一部分,该数据库和平台将会不断发展。最后,我们指出这项工作如何扩展到已经在开发类似数字化内容的新群落。数据库网址:http://portal.vertnet.org/search?advanced=1