Suppr超能文献

生物关系数据基础设施:一种用于转换和增强生物数据科学的科学架构与交换标准。

BioRels' data infrastructure: a scientific schema and exchange standard to transform and enhance biological data sciences.

作者信息

Wang Jibo, Turney Amanda, Murray Lauren, Craven Andrew M, Bragger-Wilkinson Patty, Dos Santos Bruno, Martasek Jaroslav, Desaphy Jeremy

机构信息

Lilly Genetic Medicines, Eli Lilly and Company, Indianapolis, IN 46285, United States.

Research-IDS, Eli Lilly and Company, Indianapolis, IN 46285, United States.

出版信息

Nucleic Acids Res. 2025 Mar 20;53(6). doi: 10.1093/nar/gkaf254.

Abstract

Our understanding of biology and medicinal sciences augmented by advances in data structures and algorithms has resulted in proliferation of thousands of open-sourced resources, tools, and websites that are made by the scientific community to access, process, store, and visualize biological data. However, such data have become increasingly complex and heterogeneous, leading to an entangled web of relationships and external identifiers. Despite emergence of infrastructure such as data lakes, the scientists are still responsible for the time consuming and costly exercise to find, extract, clean, prepare, and maintain such data sources while following the FAIR principles. To better understand the complexity, we lay down a representation of the mainstream data ecosystem, describing the natural relationships and concepts found in biology. Built upon it and the fundamental principles of data unicity and atomicity, we introduce BioRels, an automated and standardized data preparation workstream aiming at improving reproducibility and speed for all scientists and handling up to 145 billion data points. BioRels allows complex querying capabilities across several data sources seamlessly and provides an exchange format, BIORJ, to export and import data with all its dependency and metadata. At last, we describe the advantages, limitations, applications, and perspectives of a future approach BioRels-KB to expand future data preparation capabilities.

摘要

数据结构和算法的进步增强了我们对生物学和医学科学的理解,这导致科学界创建了数千个开源资源、工具和网站,用于访问、处理、存储和可视化生物数据。然而,此类数据变得越来越复杂和异构,导致关系和外部标识符相互交织。尽管出现了诸如数据湖之类的基础设施,但科学家们仍需负责耗时且成本高昂的工作,即在遵循FAIR原则的同时查找、提取、清理、准备和维护此类数据源。为了更好地理解这种复杂性,我们构建了主流数据生态系统的表示形式,描述了生物学中发现的自然关系和概念。在此基础上以及数据唯一性和原子性的基本原理之上,我们引入了BioRels,这是一个自动化和标准化的数据准备工作流程,旨在提高所有科学家的可重复性和速度,并处理多达1450亿个数据点。BioRels允许无缝跨多个数据源进行复杂查询,并提供一种交换格式BIORJ,用于导出和导入带有所有依赖项和元数据的数据。最后,我们描述了未来方法BioRels-KB在扩展未来数据准备能力方面的优势、局限性、应用和前景。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e769/11969666/dfade16aa9ea/gkaf254figgra1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验