Department of Biology Education & Herbarium collections (PRC), Faculty of Science, Charles University, Viničná 7, Praha 128 00, Czech Republic.
Institute of Botany of the Czech Academy of Sciences, Zámek 1, Průhonice 252 43, Czech Republic.
Database (Oxford). 2024 Oct 10;2024. doi: 10.1093/database/baae107.
The unifying element of all biodiversity data is the issue of taxon hierarchy modeling. We compared 25 existing databases in terms of handling taxa hierarchy and presentation of this data. We used documentation or demo installations of databases as a source of information and next in line was the analysis of structures using R packages provided by inspected platforms. If neither of these was available, we used the public interface of individual databases. For almost half (12) of the databases analyzed, we did not find any formalized taxa hierarchy data structure, providing only biological information about taxon membership in higher ranks, which is not fully formalizable and thus not generally usable. The least effective Adjacency List model (storing parentId of a taxon) dominates among the remaining providers. This study demonstrates the lack of attention paid by current biodiversity databases to modeling taxon hierarchy, particularly to making it available to researchers in the form of a hierarchical data structure within the data provided. For biodiversity relational databases, the Closure Table type is the most suitable of the known data models, which also corresponds to the ontology concept. However, its use is rather sporadic within the biodiversity databases ecosystem.
所有生物多样性数据的统一要素是分类单元层次结构建模问题。我们比较了 25 个现有的数据库,从处理分类单元层次结构和呈现此数据的角度进行比较。我们使用数据库的文档或演示安装作为信息来源,接下来是使用检查平台提供的 R 包分析结构。如果这些都不可用,则使用各个数据库的公共接口。对于分析的近一半(12 个)数据库,我们没有找到任何形式化的分类单元层次结构数据结构,仅提供关于高级分类单元成员身份的生物信息,这些信息不能完全形式化,因此通常不可用。在其余的提供者中,占主导地位的是最少有效的邻接列表模型(存储分类单元的 parentId)。这项研究表明,当前生物多样性数据库对分类单元层次结构建模的重视程度不够,特别是在提供给研究人员的形式是数据中提供的分层数据结构。对于生物多样性关系数据库,闭包表类型是已知数据模型中最合适的一种,它也与本体概念相对应。然而,在生物多样性数据库生态系统中,它的使用相当分散。