Suppr超能文献

解锁《动物索引》:从纸条到字节与位元。

Unlocking Index Animalium: From paper slips to bytes and bits.

作者信息

Pilsk Suzanne C, Kalfatovic Martin R, Richard Joel M

机构信息

Smithsonian Libraries, Washington DC, USA.

出版信息

Zookeys. 2016 Jan 7(550):153-71. doi: 10.3897/zookeys.550.9673. eCollection 2016.

Abstract

In 1996 Smithsonian Libraries (SIL) embarked on the digitization of its collections. By 1999, a full-scale digitization center was in place and rare volumes from the natural history collections, often of high illustrative value, were the focus for the first years of the program. The resulting beautiful books made available for online display were successful to a certain extent, but it soon became clear that the data locked within the texts needed to be converted to more usable and re-purposable form via digitization methods that went beyond simple page imaging and included text conversion elements. Library staff met with researchers from the taxonomic community to understand their path to the literature and identified tools (indexes and bibliographies) used to connect to the library holdings. The traditional library metadata describing the titles, which made them easily retrievable from the shelves of libraries, was not meeting the needs of the researcher looking for more detailed and granular data within the texts. The result was to identify proper print tools that could potential assist researchers in digital form. This paper outlines the project undertaken to convert Charles Davies Sherborn's Index Animalium into a tool to connect researchers to the library holdings: from a print index to a database to eventually a dataset. Sherborn's microcitation of a species name and his bibliographies help bridge the gap between taxonomist and literature holdings of libraries. In 2004, SIL received funding from the Smithsonian's Atherton Seidell Endowment to create an online version of Sherborn's Index Animalium. The initial project was to digitize the page images and re-key the data into a simple data structure. As the project evolved, a more complex database was developed which enabled quality field searching to retrieve species names and to search the bibliography. Problems with inconsistent abbreviations and styling of his bibliographies made the parsing of the data difficult. Coinciding with the development of the Biodiversity Heritage Library (BHL) in 2005, it became obvious there was a need to integrate the database converted Index Animalium, BHL's scanned taxonomic literature, and taxonomic intelligence (the algorithmic identification of binomial, Latinate name-strings). The challenges of working with legacy taxonomic citation, computer matching algorithms, and making connections have brought us to today's goal of making Sherborn available and linked to other datasets. Partnering with others to allow machine-to-machine communications the data is being examined for possible transformation into RDF markup and meeting the standards of Linked Open Data. SIL staff have partnered with Thomson Reuters and the Global Names Initiative to further enhance the Index Animalium data set. Thomson Reuters' staff is now working on integrating the species microcitation and species name in the ION: Index to Organism Names project; Richard Pyle (The Bishop Museum) is also working on further parsing of the text. The Index Animalium collaborative project's ultimate goal is to successful have researchers go seamlessly from the species name in either ION or the scanned pages of Index Animalium to the digitized original description in BHL - connecting taxonomic researchers to original authored species descriptions with just a click.

摘要

1996年,史密森尼图书馆(SIL)开始对其馆藏进行数字化。到1999年,一个全面的数字化中心已建成,自然历史馆藏中的珍稀书籍,通常具有很高的说明价值,成为该项目最初几年的重点。由此产生的可供在线展示的精美书籍在一定程度上取得了成功,但很快就清楚地认识到,文本中锁定的数据需要通过超越简单页面成像且包含文本转换元素的数字化方法,转换为更易于使用和重新利用的形式。图书馆工作人员与分类学界的研究人员会面,了解他们获取文献的途径,并确定了用于连接图书馆馆藏的工具(索引和书目)。传统图书馆描述图书标题的元数据,虽能使其在图书馆书架上易于检索,但无法满足研究人员在文本中寻找更详细和粒度数据的需求。结果是确定合适的印刷工具,这些工具可能以数字形式帮助研究人员。本文概述了将查尔斯·戴维斯·谢伯恩的《动物索引》(Index Animalium)转换为连接研究人员与图书馆馆藏的工具的项目:从印刷索引到数据库,最终到数据集。谢伯恩对物种名称的微观引用及其书目有助于弥合分类学家与图书馆文献馆藏之间的差距。2004年,SIL获得了史密森尼阿瑟顿·塞德尔捐赠基金的资助,以创建谢伯恩《动物索引》的在线版本。最初的项目是对页面图像进行数字化,并将数据重新键入一个简单的数据结构。随着项目的发展,开发了一个更复杂的数据库,该数据库能够进行高质量的字段搜索,以检索物种名称并搜索书目。他的书目缩写和样式不一致的问题使得数据解析变得困难。2005年生物多样性遗产图书馆(BHL)发展的同时,很明显需要整合转换后的《动物索引》数据库、BHL扫描的分类学文献以及分类学情报(对双名法、拉丁化名称字符串的算法识别)。处理传统分类学引用、计算机匹配算法以及建立联系所面临的挑战,使我们朝着如今让谢伯恩的作品可供使用并与其他数据集链接的目标迈进。与其他机构合作以实现机器对机器通信,正在对数据进行审查,以考虑是否可能转换为RDF标记并符合关联开放数据的标准。SIL工作人员已与汤森路透和全球名称倡议合作,以进一步完善《动物索引》数据集。汤森路透的工作人员目前正在ION:生物名称索引项目中整合物种微观引用和物种名称;理查德·派尔(主教博物馆)也在对文本进行进一步解析。《动物索引》合作项目的最终目标是让研究人员能够无缝地从ION中的物种名称或《动物索引》的扫描页面,跳转至BHL中数字化的原始描述——只需点击一下,就能将分类学研究人员与原始的物种描述作者建立联系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2304/4741219/5248f5493118/zookeys-550-153-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验