Suppr超能文献

生物多样性数据库中层次数据的关系建模。

The relational modeling of hierarchical data in biodiversity databases.

机构信息

Department of Biology Education & Herbarium collections (PRC), Faculty of Science, Charles University, Viničná 7, Praha 128 00, Czech Republic.

Institute of Botany of the Czech Academy of Sciences, Zámek 1, Průhonice 252 43, Czech Republic.

出版信息

Database (Oxford). 2024 Oct 10;2024. doi: 10.1093/database/baae107.

Abstract

The unifying element of all biodiversity data is the issue of taxon hierarchy modeling. We compared 25 existing databases in terms of handling taxa hierarchy and presentation of this data. We used documentation or demo installations of databases as a source of information and next in line was the analysis of structures using R packages provided by inspected platforms. If neither of these was available, we used the public interface of individual databases. For almost half (12) of the databases analyzed, we did not find any formalized taxa hierarchy data structure, providing only biological information about taxon membership in higher ranks, which is not fully formalizable and thus not generally usable. The least effective Adjacency List model (storing parentId of a taxon) dominates among the remaining providers. This study demonstrates the lack of attention paid by current biodiversity databases to modeling taxon hierarchy, particularly to making it available to researchers in the form of a hierarchical data structure within the data provided. For biodiversity relational databases, the Closure Table type is the most suitable of the known data models, which also corresponds to the ontology concept. However, its use is rather sporadic within the biodiversity databases ecosystem.

摘要

所有生物多样性数据的统一要素是分类单元层次结构建模问题。我们比较了 25 个现有的数据库,从处理分类单元层次结构和呈现此数据的角度进行比较。我们使用数据库的文档或演示安装作为信息来源,接下来是使用检查平台提供的 R 包分析结构。如果这些都不可用,则使用各个数据库的公共接口。对于分析的近一半(12 个)数据库,我们没有找到任何形式化的分类单元层次结构数据结构,仅提供关于高级分类单元成员身份的生物信息,这些信息不能完全形式化,因此通常不可用。在其余的提供者中,占主导地位的是最少有效的邻接列表模型(存储分类单元的 parentId)。这项研究表明,当前生物多样性数据库对分类单元层次结构建模的重视程度不够,特别是在提供给研究人员的形式是数据中提供的分层数据结构。对于生物多样性关系数据库,闭包表类型是已知数据模型中最合适的一种,它也与本体概念相对应。然而,在生物多样性数据库生态系统中,它的使用相当分散。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7598/11466226/60a877853059/baae107f1.jpg

相似文献

1
The relational modeling of hierarchical data in biodiversity databases.
Database (Oxford). 2024 Oct 10;2024. doi: 10.1093/database/baae107.
2
A higher level classification of all living organisms.
PLoS One. 2015 Apr 29;10(4):e0119248. doi: 10.1371/journal.pone.0119248. eCollection 2015.
3
The effectiveness of surrogate taxa for the representation of biodiversity.
Conserv Biol. 2010 Oct;24(5):1367-77. doi: 10.1111/j.1523-1739.2010.01513.x.
4
Research applications of primary biodiversity databases in the digital age.
PLoS One. 2019 Sep 11;14(9):e0215794. doi: 10.1371/journal.pone.0215794. eCollection 2019.
5
Interoperability of biodiversity databases: biodiversity information on every desktop.
Science. 2000 Sep 29;289(5488):2312-4. doi: 10.1126/science.289.5488.2312.
6
A taxonomic-based joint species distribution model for presence-only data.
J R Soc Interface. 2022 Feb;19(187):20210681. doi: 10.1098/rsif.2021.0681. Epub 2022 Feb 23.
7
Postfire biodiversity database for eastern Iberia.
Sci Data. 2023 Dec 6;10(1):872. doi: 10.1038/s41597-023-02794-9.
8
Data Leakage and Loss in Biodiversity Informatics.
Biodivers Data J. 2018 Nov 7(6):e26826. doi: 10.3897/BDJ.6.e26826. eCollection 2018.

本文引用的文献

1
The big four of plant taxonomy - a comparison of global checklists of vascular plant names.
New Phytol. 2023 Nov;240(4):1687-1702. doi: 10.1111/nph.18961. Epub 2023 May 27.
2
Better incentives are needed to reward academic software development.
Nat Ecol Evol. 2023 May;7(5):626-627. doi: 10.1038/s41559-023-02008-w.
3
Pladias platform: Technical description of the database structure.
Biodivers Data J. 2022 Apr 1;10:e80167. doi: 10.3897/BDJ.10.e80167. eCollection 2022.
7
Biologer: an open platform for collecting biodiversity data.
Biodivers Data J. 2020 Jun 11;8:e53014. doi: 10.3897/BDJ.8.e53014. eCollection 2020.
8
Phylolink: phylogenetically-based profiling, visualisations and metrics for biodiversity.
Bioinformatics. 2019 Apr 1;35(7):1229-1230. doi: 10.1093/bioinformatics/bty792.
9
The next generation of natural history collections.
PLoS Biol. 2018 Jul 16;16(7):e2006125. doi: 10.1371/journal.pbio.2006125. eCollection 2018 Jul.
10
Making species checklists understandable to machines - a shift from relational databases to ontologies.
J Biomed Semantics. 2014 Sep 8;5:40. doi: 10.1186/2041-1480-5-40. eCollection 2014.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验