Jones Andrew C, White Richard J, Orme Ewen R
Cardiff School of Computer Science & Informatics, Cardiff University, Queen's Buildings, 5 The Parade, Cardiff CF24 3AA, UK.
J Biomed Semantics. 2011 Oct 17;2(1):7. doi: 10.1186/2041-1480-2-7.
In this paper we describe our experience of adding globally unique identifiers to the Species 2000 and ITIS Catalogue of Life, an on-line index of organisms which is intended, ultimately, to cover all the world's known species. The scientific species names held in the Catalogue are names that already play an extensive role as terms in the organisation of information about living organisms in bioinformatics and other domains, but the effectiveness of their use is hindered by variation in individuals' opinions and understanding of these terms; indeed, in some cases more than one name will have been used to refer to the same organism. This means that it is desirable to be able to give unique labels to each of these differing concepts within the catalogue and to be able to determine which concepts are being used in other systems, in order that they can be associated with the concepts in the catalogue. Not only is this needed, but it is also necessary to know the relationships between alternative concepts that scientists might have employed, as these determine what can be inferred when data associated with related concepts is being processed. A further complication is that the catalogue itself is evolving as scientific opinion changes due to an increasing understanding of life.
We describe how we are using Life Science Identifiers (LSIDs) as globally unique identifiers in the Catalogue of Life, explaining how the mapping to species concepts is performed, how concepts are associated with specific editions of the catalogue, and how the Taxon Concept Schema has been adopted in order to express information about concepts and their relationships. We explore the implications of using globally unique identifiers in order to refer to abstract concepts such as species, which incorporate at least a measure of subjectivity in their definition, in contrast with the more traditional use of such identifiers to refer to more tangible entities, events, documents, observations, etc.
A major reason for adopting identifiers such as LSIDs is to facilitate data integration. We have demonstrated the incorporation of LSIDs into the Catalogue of Life, in a manner consistent with the biodiversity informatics community's conventions for LSID use. The Catalogue of Life is therefore available as a taxonomy of organisms for use within various disciplines, including biomedical research, by software written with an awareness of these conventions.
在本文中,我们描述了在《生命目录》(Species 2000和ITIS的生命目录,这是一个在线生物索引,最终旨在涵盖世界上所有已知物种)中添加全球唯一标识符的经验。该目录中保存的科学物种名称,已在生物信息学和其他领域作为组织有关生物信息的术语发挥了广泛作用,但其使用效果受到个人对这些术语的看法和理解差异的阻碍;实际上,在某些情况下,会用不止一个名称来指代同一生物。这意味着希望能够为目录中的每个不同概念赋予唯一标签,并能够确定其他系统中正在使用哪些概念,以便它们能够与目录中的概念相关联。不仅需要这样做,而且还必须了解科学家可能采用的替代概念之间的关系,因为这些关系决定了在处理与相关概念相关的数据时可以推断出什么。另一个复杂之处在于,随着对生命的理解不断增加,科学观点发生变化,目录本身也在不断演变。
我们描述了如何在生命目录中使用生命科学标识符(LSID)作为全球唯一标识符,解释了如何进行到物种概念的映射、概念如何与目录的特定版本相关联,以及如何采用分类单元概念模式来表达有关概念及其关系的信息。我们探讨了使用全球唯一标识符来指代诸如物种之类的抽象概念的含义,与更传统地使用此类标识符来指代更具体的实体、事件、文档、观察结果等相比,物种的定义至少包含一定程度的主观性。
采用诸如LSID之类的标识符的一个主要原因是促进数据集成。我们已经证明以与生物多样性信息学社区使用LSID的惯例一致的方式将LSID纳入生命目录。因此,生命目录可作为生物分类法供包括生物医学研究在内的各学科中的软件使用,这些软件在编写时了解这些惯例。