Laurenne Nina, Tuominen Jouni, Saarenmaa Hannu, Hyvönen Eero
Semantic Computing Research Group (SeCo), Department of Media Technology, Aalto University, P.O. Box 15500, 00076 Aalto, Espoo, Finland.
Digitarium, University of Eastern Finland, P.O. Box 111, 80101 Joensuu, Finland.
J Biomed Semantics. 2014 Sep 8;5:40. doi: 10.1186/2041-1480-5-40. eCollection 2014.
The scientific names of plants and animals play a major role in Life Sciences as information is indexed, integrated, and searched using scientific names. The main problem with names is their ambiguous nature, because more than one name may point to the same taxon and multiple taxa may share the same name. In addition, scientific names change over time, which makes them open to various interpretations. Applying machine-understandable semantics to these names enables efficient processing of biological content in information systems. The first step is to use unique persistent identifiers instead of name strings when referring to taxa. The most commonly used identifiers are Life Science Identifiers (LSID), which are traditionally used in relational databases, and more recently HTTP URIs, which are applied on the Semantic Web by Linked Data applications.
We introduce two models for expressing taxonomic information in the form of species checklists. First, we show how species checklists are presented in a relational database system using LSIDs. Then, in order to gain a more detailed representation of taxonomic information, we introduce meta-ontology TaxMeOn to model the same content as Semantic Web ontologies where taxa are identified using HTTP URIs. We also explore how changes in scientific names can be managed over time.
The use of HTTP URIs is preferable for presenting the taxonomic information of species checklists. An HTTP URI identifies a taxon and operates as a web address from which additional information about the taxon can be located, unlike LSID. This enables the integration of biological data from different sources on the web using Linked Data principles and prevents the formation of information silos. The Linked Data approach allows a user to assemble information and evaluate the complexity of taxonomical data based on conflicting views of taxonomic classifications. Using HTTP URIs and Semantic Web technologies also facilitate the representation of the semantics of biological data, and in this way, the creation of more "intelligent" biological applications and services.
植物和动物的科学名称在生命科学中起着重要作用,因为信息是使用科学名称进行索引、整合和搜索的。名称的主要问题在于其模糊性,因为不止一个名称可能指向同一个分类单元,并且多个分类单元可能共享同一个名称。此外,科学名称会随时间变化,这使得它们容易产生各种解释。将机器可理解的语义应用于这些名称能够在信息系统中高效处理生物内容。第一步是在提及分类单元时使用唯一的持久标识符而非名称字符串。最常用的标识符是生命科学标识符(LSID),传统上用于关系数据库,最近则是HTTP统一资源标识符(URI),由关联数据应用在语义网上使用。
我们引入了两种以物种清单形式表达分类信息的模型。首先,我们展示了物种清单在关系数据库系统中如何使用LSID呈现。然后,为了获得更详细的分类信息表示,我们引入元本体TaxMeOn来对与语义网本体相同的内容进行建模,其中分类单元使用HTTP URI进行标识。我们还探讨了如何随时间管理科学名称的变化。
使用HTTP URI来呈现物种清单的分类信息更为可取。与LSID不同,HTTP URI标识一个分类单元,并作为一个网址运行,从该网址可以找到关于该分类单元的其他信息。这使得能够使用关联数据原则在网络上整合来自不同来源的生物数据,并防止形成信息孤岛。关联数据方法允许用户根据分类学分类的冲突观点来收集信息并评估分类数据的复杂性。使用HTTP URI和语义网技术还便于表示生物数据的语义,从而创建更“智能”的生物应用和服务。