生物空间：基于空间图的生态大数据计算。

Biospytial: spatial graph-based computing for ecological Big Data.

机构信息

Lancaster Environment Centre, Lancaster University, Library Avenue, Lancaster, LA1 4YQ, UK.

Centre for Health Informatics, Computing and Statistics (CHICAS), Lancaster Medical School, Faculty of Health and Medicine, Furness Building, Lancaster University, Lancaster, LA1 4YQ, UK.

出版信息

Gigascience. 2020 May 1;9(5). doi: 10.1093/gigascience/giaa039.

DOI:10.1093/gigascience/giaa039

PMID:32391910

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7213554/

Abstract

BACKGROUND

The exponential accumulation of environmental and ecological data together with the adoption of open data initiatives bring opportunities and challenges for integrating and synthesising relevant knowledge that need to be addressed, given the ongoing environmental crises.

FINDINGS

Here we present Biospytial, a modular open source knowledge engine designed to import, organise, analyse and visualise big spatial ecological datasets using the power of graph theory. The engine uses a hybrid graph-relational approach to store and access information. A graph data structure uses linkage relationships to build semantic structures represented as complex data structures stored in a graph database, while tabular and geospatial data are stored in an efficient spatial relational database system. We provide an application using information on species occurrences, their taxonomic classification and climatic datasets. We built a knowledge graph of the Tree of Life embedded in an environmental and geographical grid to perform an analysis on threatened species co-occurring with jaguars (Panthera onca).

CONCLUSIONS

The Biospytial approach reduces the complexity of joining datasets using multiple tabular relations, while its scalable design eases the problem of merging datasets from different sources. Its modular design makes it possible to distribute several instances simultaneously, allowing fast and efficient handling of big ecological datasets. The provided example demonstrates the engine's capabilities in performing basic graph manipulation, analysis and visualizations of taxonomic groups co-occurring in space. The example shows potential avenues for performing novel ecological analyses, biodiversity syntheses and species distribution models aided by a network of taxonomic and spatial relationships.

摘要

背景

环境和生态数据的指数级积累，加上开放数据倡议的采用，为整合和综合相关知识带来了机遇和挑战，而这些知识需要应对当前的环境危机。

发现

在这里，我们提出了 Biospytial，这是一个模块化的开源知识引擎，旨在利用图论的强大功能来导入、组织、分析和可视化大型空间生态数据集。该引擎采用混合图关系方法来存储和访问信息。图数据结构使用链接关系来构建表示为存储在图数据库中的复杂数据结构的语义结构，而表格和地理空间数据存储在高效的空间关系数据库系统中。我们提供了一个应用程序，使用物种出现、分类学分类和气候数据集的信息。我们构建了一个嵌入在环境和地理网格中的生命之树知识图，以分析与美洲虎（Panthera onca）共同出现的受威胁物种。

结论

Biospytial 方法通过使用多个表格关系来减少连接数据集的复杂性，同时其可扩展的设计简化了来自不同来源的数据集合并的问题。其模块化设计使得同时分发多个实例成为可能，从而可以快速有效地处理大型生态数据集。提供的示例演示了该引擎在执行基本图操作、空间上共同出现的分类群的分析和可视化方面的功能。该示例展示了通过分类和空间关系网络执行新的生态分析、生物多样性综合和物种分布模型的潜在途径。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20a2/7213554/2b8f49431137/giaa039fig1.jpg

相似文献

Biospytial: spatial graph-based computing for ecological Big Data.生物空间：基于空间图的生态大数据计算。

Gigascience. 2020 May 1;9(5). doi: 10.1093/gigascience/giaa039.

Treemendous: an R package for integrating taxonomic information across backbones.Treemendous：一个用于整合跨主干分类信息的R包。

PeerJ. 2024 Feb 28;12:e16896. doi: 10.7717/peerj.16896. eCollection 2024.

Developing a flexible learning activity on biodiversity and spatial scale concepts using open-access vegetation datasets from the National Ecological Observatory Network.利用国家生态观测站网络的开放获取植被数据集，开展关于生物多样性和空间尺度概念的灵活学习活动。

Ecol Evol. 2021 Mar 21;11(9):3660-3671. doi: 10.1002/ece3.7385. eCollection 2021 May.

Building essential biodiversity variables (EBVs) of species distribution and abundance at a global scale.在全球范围内构建物种分布和丰度的基本生物多样性变量 (EBVs)。

Biol Rev Camb Philos Soc. 2018 Feb;93(1):600-625. doi: 10.1111/brv.12359. Epub 2017 Aug 2.

Orchestrating privacy-protected big data analyses of data from different resources with R and DataSHIELD.使用R和DataSHIELD对来自不同资源的数据进行隐私保护的大数据分析编排。

PLoS Comput Biol. 2021 Mar 30;17(3):e1008880. doi: 10.1371/journal.pcbi.1008880. eCollection 2021 Mar.

ProteoLens: a visual analytic tool for multi-scale database-driven biological network data mining.ProteoLens：一种用于多尺度数据库驱动的生物网络数据挖掘的可视化分析工具。

BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S5. doi: 10.1186/1471-2105-9-S9-S5.

Reactome graph database: Efficient access to complex pathway data.Reactome 图形数据库：高效访问复杂的通路数据。

PLoS Comput Biol. 2018 Jan 29;14(1):e1005968. doi: 10.1371/journal.pcbi.1005968. eCollection 2018 Jan.

Graph4Med: a web application and a graph database for visualizing and analyzing medical databases.Graph4Med：一个用于可视化和分析医学数据库的网络应用程序和图数据库。

BMC Bioinformatics. 2022 Dec 12;23(1):537. doi: 10.1186/s12859-022-05092-0.

Investigating Health Context Using a Spatial Data Analytical Tool: Development of a Geospatial Big Data Ecosystem.使用空间数据分析工具调查健康背景：地理空间大数据生态系统的开发。

JMIR Med Inform. 2022 Apr 6;10(4):e35073. doi: 10.2196/35073.

Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象：化学与物理邂逅生物学（瑞士阿斯科纳，2012年6月10日至14日）

Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.

引用本文的文献

Addressing the need for interactive, efficient, and reproducible data processing in ecology with the datacleanr R package.使用 datacleanr R 包满足生态学领域对交互式、高效且可重复的数据处理的需求。

PLoS One. 2022 May 12;17(5):e0268426. doi: 10.1371/journal.pone.0268426. eCollection 2022.

A taxonomic-based joint species distribution model for presence-only data.一种基于分类学的仅存在数据联合物种分布模型。

J R Soc Interface. 2022 Feb;19(187):20210681. doi: 10.1098/rsif.2021.0681. Epub 2022 Feb 23.

本文引用的文献

PyMC: a modern, and comprehensive probabilistic programming framework in Python.PyMC：Python 中一个现代且全面的概率编程框架。

PeerJ Comput Sci. 2023 Sep 1;9:e1516. doi: 10.7717/peerj-cs.1516. eCollection 2023.

No more excuses for non-reproducible methods.不要再为不可重复的方法找借口了。

Nature. 2018 Aug;560(7719):411. doi: 10.1038/d41586-018-06008-w.

A toolkit for data transparency takes shape.一个数据透明度工具包正在形成。

Nature. 2018 Aug;560(7719):513-515. doi: 10.1038/d41586-018-05990-5.

Reactome graph database: Efficient access to complex pathway data.Reactome 图形数据库：高效访问复杂的通路数据。

PLoS Comput Biol. 2018 Jan 29;14(1):e1005968. doi: 10.1371/journal.pcbi.1005968. eCollection 2018 Jan.

Building essential biodiversity variables (EBVs) of species distribution and abundance at a global scale.在全球范围内构建物种分布和丰度的基本生物多样性变量 (EBVs)。

Biol Rev Camb Philos Soc. 2018 Feb;93(1):600-625. doi: 10.1111/brv.12359. Epub 2017 Aug 2.

Assessing the umbrella value of a range-wide conservation network for jaguars (Panthera onca).评估一个广泛范围内的保护网络对美洲豹（Panthera onca）的整体保护价值。

Ecol Appl. 2016 Jun;26(4):1112-24. doi: 10.1890/15-0602.

Biodiversity analysis in the digital era.数字时代的生物多样性分析。

Philos Trans R Soc Lond B Biol Sci. 2016 Sep 5;371(1702). doi: 10.1098/rstb.2015.0337.

Taking a 'Big Data' approach to data quality in a citizen science project.在公民科学项目中采用“大数据”方法处理数据质量问题。

Ambio. 2015 Nov;44 Suppl 4(Suppl 4):601-11. doi: 10.1007/s13280-015-0710-4.

The PREDICTS database: a global database of how local terrestrial biodiversity responds to human impacts.PREDICTS数据库：一个关于当地陆地生物多样性如何应对人类影响的全球数据库。

Ecol Evol. 2014 Dec;4(24):4701-35. doi: 10.1002/ece3.1303. Epub 2014 Dec 2.

Best practices for scientific computing.科学计算的最佳实践。

PLoS Biol. 2014 Jan;12(1):e1001745. doi: 10.1371/journal.pbio.1001745. Epub 2014 Jan 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

生物空间：基于空间图的生态大数据计算。

Biospytial: spatial graph-based computing for ecological Big Data.

机构信息

出版信息

BACKGROUND

FINDINGS

CONCLUSIONS

背景

发现

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献