• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物关系数据基础设施:一种用于转换和增强生物数据科学的科学架构与交换标准。

BioRels' data infrastructure: a scientific schema and exchange standard to transform and enhance biological data sciences.

作者信息

Wang Jibo, Turney Amanda, Murray Lauren, Craven Andrew M, Bragger-Wilkinson Patty, Dos Santos Bruno, Martasek Jaroslav, Desaphy Jeremy

机构信息

Lilly Genetic Medicines, Eli Lilly and Company, Indianapolis, IN 46285, United States.

Research-IDS, Eli Lilly and Company, Indianapolis, IN 46285, United States.

出版信息

Nucleic Acids Res. 2025 Mar 20;53(6). doi: 10.1093/nar/gkaf254.

DOI:10.1093/nar/gkaf254
PMID:40183635
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11969666/
Abstract

Our understanding of biology and medicinal sciences augmented by advances in data structures and algorithms has resulted in proliferation of thousands of open-sourced resources, tools, and websites that are made by the scientific community to access, process, store, and visualize biological data. However, such data have become increasingly complex and heterogeneous, leading to an entangled web of relationships and external identifiers. Despite emergence of infrastructure such as data lakes, the scientists are still responsible for the time consuming and costly exercise to find, extract, clean, prepare, and maintain such data sources while following the FAIR principles. To better understand the complexity, we lay down a representation of the mainstream data ecosystem, describing the natural relationships and concepts found in biology. Built upon it and the fundamental principles of data unicity and atomicity, we introduce BioRels, an automated and standardized data preparation workstream aiming at improving reproducibility and speed for all scientists and handling up to 145 billion data points. BioRels allows complex querying capabilities across several data sources seamlessly and provides an exchange format, BIORJ, to export and import data with all its dependency and metadata. At last, we describe the advantages, limitations, applications, and perspectives of a future approach BioRels-KB to expand future data preparation capabilities.

摘要

数据结构和算法的进步增强了我们对生物学和医学科学的理解,这导致科学界创建了数千个开源资源、工具和网站,用于访问、处理、存储和可视化生物数据。然而,此类数据变得越来越复杂和异构,导致关系和外部标识符相互交织。尽管出现了诸如数据湖之类的基础设施,但科学家们仍需负责耗时且成本高昂的工作,即在遵循FAIR原则的同时查找、提取、清理、准备和维护此类数据源。为了更好地理解这种复杂性,我们构建了主流数据生态系统的表示形式,描述了生物学中发现的自然关系和概念。在此基础上以及数据唯一性和原子性的基本原理之上,我们引入了BioRels,这是一个自动化和标准化的数据准备工作流程,旨在提高所有科学家的可重复性和速度,并处理多达1450亿个数据点。BioRels允许无缝跨多个数据源进行复杂查询,并提供一种交换格式BIORJ,用于导出和导入带有所有依赖项和元数据的数据。最后,我们描述了未来方法BioRels-KB在扩展未来数据准备能力方面的优势、局限性、应用和前景。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e769/11969666/dd92187c9e6a/gkaf254fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e769/11969666/dfade16aa9ea/gkaf254figgra1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e769/11969666/985b67212dc5/gkaf254fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e769/11969666/5288f68f0a32/gkaf254fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e769/11969666/f0a97c907c83/gkaf254fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e769/11969666/28fa0efc65fc/gkaf254fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e769/11969666/3c05f80fa8fa/gkaf254fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e769/11969666/dd92187c9e6a/gkaf254fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e769/11969666/dfade16aa9ea/gkaf254figgra1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e769/11969666/985b67212dc5/gkaf254fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e769/11969666/5288f68f0a32/gkaf254fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e769/11969666/f0a97c907c83/gkaf254fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e769/11969666/28fa0efc65fc/gkaf254fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e769/11969666/3c05f80fa8fa/gkaf254fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e769/11969666/dd92187c9e6a/gkaf254fig6.jpg

相似文献

1
BioRels' data infrastructure: a scientific schema and exchange standard to transform and enhance biological data sciences.生物关系数据基础设施:一种用于转换和增强生物数据科学的科学架构与交换标准。
Nucleic Acids Res. 2025 Mar 20;53(6). doi: 10.1093/nar/gkaf254.
2
caCORE: a common infrastructure for cancer informatics.caCORE:癌症信息学的通用基础设施。
Bioinformatics. 2003 Dec 12;19(18):2404-12. doi: 10.1093/bioinformatics/btg335.
3
Metadata matters: access to image data in the real world.元数据很重要:在现实世界中访问图像数据。
J Cell Biol. 2010 May 31;189(5):777-82. doi: 10.1083/jcb.201004104.
4
An XML transfer schema for exchange of genomic and genetic mapping data: implementation as a web service in a Taverna workflow.用于基因组和遗传图谱数据交换的 XML 传输模式:作为 Taverna 工作流中的 Web 服务实现。
BMC Bioinformatics. 2009 Aug 14;10:252. doi: 10.1186/1471-2105-10-252.
5
The Neurodata Without Borders ecosystem for neurophysiological data science.Neurodata Without Borders 生态系统用于神经生理数据科学。
Elife. 2022 Oct 4;11:e78362. doi: 10.7554/eLife.78362.
6
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象:化学与物理邂逅生物学(瑞士阿斯科纳,2012年6月10日至14日)
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
7
Tripal EUtils: a Tripal module to increase exchange and reuse of genome assembly metadata.Triple EUtils:一个 Triple 模块,用于增加基因组组装元数据的交换和重用。
Database (Oxford). 2020 Jan 1;2019. doi: 10.1093/database/baz143.
8
A metadata schema for data objects in clinical research.临床研究中数据对象的元数据模式。
Trials. 2016 Nov 24;17(1):557. doi: 10.1186/s13063-016-1686-5.
9
Open bioinformatics.开放生物信息学
Bioinformatics. 2003 Apr 12;19(6):679-80. doi: 10.1093/bioinformatics/btg214.
10
Designing ETL Tools to Feed a Data Warehouse Based on Electronic Healthcare Record Infrastructure.基于电子健康记录基础设施设计用于为数据仓库提供数据的ETL工具。
Stud Health Technol Inform. 2015;210:929-33.

本文引用的文献

1
The 2025 Nucleic Acids Research database issue and the online molecular biology database collection.《核酸研究》2025年数据库特刊及在线分子生物学数据库合集。
Nucleic Acids Res. 2025 Jan 6;53(D1):D1-D9. doi: 10.1093/nar/gkae1220.
2
Open Targets Platform: facilitating therapeutic hypotheses building in drug discovery.开放靶点平台:助力药物研发中的治疗假说构建
Nucleic Acids Res. 2025 Jan 6;53(D1):D1467-D1475. doi: 10.1093/nar/gkae1128.
3
Ensembl 2025.Ensembl 2025。
Nucleic Acids Res. 2025 Jan 6;53(D1):D948-D957. doi: 10.1093/nar/gkae1071.
4
EMBL's European Bioinformatics Institute (EMBL-EBI) in 2024.欧洲分子生物学实验室的欧洲生物信息学研究所(EMBL-EBI),于2024年。
Nucleic Acids Res. 2025 Jan 6;53(D1):D10-D19. doi: 10.1093/nar/gkae1089.
5
miRTarBase 2025: updates to the collection of experimentally validated microRNA-target interactions.miRTarBase 2025:经实验验证的微小RNA-靶标相互作用集合的更新
Nucleic Acids Res. 2025 Jan 6;53(D1):D147-D156. doi: 10.1093/nar/gkae1072.
6
ClinVar: updates to support classifications of both germline and somatic variants.ClinVar:更新以支持种系变异和体细胞变异的分类。
Nucleic Acids Res. 2025 Jan 6;53(D1):D1313-D1321. doi: 10.1093/nar/gkae1090.
7
Graph databases in systems biology: a systematic review.系统生物学中的图数据库:系统评价。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae561.
8
InterPro: the protein sequence classification resource in 2025.InterPro:2025年的蛋白质序列分类资源。
Nucleic Acids Res. 2025 Jan 6;53(D1):D444-D456. doi: 10.1093/nar/gkae1082.
9
GENCODE 2025: reference gene annotation for human and mouse.GENCODE 2025:人类和小鼠的参考基因注释
Nucleic Acids Res. 2025 Jan 6;53(D1):D966-D975. doi: 10.1093/nar/gkae1078.
10
GenBank 2025 update.GenBank 2025年更新版。
Nucleic Acids Res. 2025 Jan 6;53(D1):D56-D61. doi: 10.1093/nar/gkae1114.