• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

BioBricks.ai:生命科学数据资产的版本化数据注册库。

BioBricks.ai: a versioned data registry for life sciences data assets.

作者信息

Gao Yifan, Mughal Zakariyya, Jaramillo-Villegas Jose A, Corradi Marie, Borrel Alexandre, Lieberman Ben, Sharif Suliman, Shaffer John, Fecho Karamarie, Chatrath Ajay, Maertens Alexandra, Teunis Marc A T, Kleinstreuer Nicole, Hartung Thomas, Luechtefeld Thomas

机构信息

Center for Alternative to Animal Testing, Johns Hopkins University, Baltimore, MD, United States.

Insilica, Bethesda, MD, United States.

出版信息

Front Artif Intell. 2025 Aug 13;8:1599412. doi: 10.3389/frai.2025.1599412. eCollection 2025.

DOI:10.3389/frai.2025.1599412
PMID:40880880
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12380851/
Abstract

INTRODUCTION

Researchers in biomedicine and public health often spend weeks locating, cleansing, and integrating data from disparate sources before analysis can begin. This redundancy slows discovery and leads to inconsistent pipelines.

METHODS

We created BioBricks.ai, an open, centralized repository that packages public biological and chemical datasets as modular "bricks." Each brick is a Data Version Control (DVC) Git repository containing an extract‑transform‑load (ETL) pipeline. A package‑manager-like interface handles installation, dependency resolution, and updates, while data are delivered through a unified backend (https://biobricks.ai).

RESULTS

The current release provides >90 curated datasets spanning genomics, proteomics, cheminformatics, and epidemiology. Bricks can be combined programmatically to build composite resources; benchmark use‑cases show that assembling multi‑dataset analytic cohorts is reduced from days to minutes compared with bespoke scripts.

DISCUSSION

BioBricks.ai accelerates data access, promotes reproducible workflows, and lowers the barrier for integrating heterogeneous public datasets. By treating data as version‑controlled software, the platform encourages community contributions and reduces redundant engineering effort. Continued expansion of brick coverage and automated provenance tracking will further enhance FAIR (Findable, Accessible, Interoperable, Reusable) data practices across the life‑science community.

摘要

引言

生物医学和公共卫生领域的研究人员通常需要花费数周时间来查找、清理和整合来自不同来源的数据,然后才能开始分析。这种冗余减缓了发现速度,并导致管道不一致。

方法

我们创建了BioBricks.ai,这是一个开放的集中式存储库,将公共生物和化学数据集打包为模块化的“砖块”。每个砖块都是一个数据版本控制(DVC)Git存储库,包含一个提取-转换-加载(ETL)管道。一个类似包管理器的界面处理安装、依赖项解析和更新,而数据则通过统一的后端(https://biobricks.ai)提供。

结果

当前版本提供了90多个经过策划的数据集,涵盖基因组学、蛋白质组学、化学信息学和流行病学。砖块可以通过编程方式组合以构建复合资源;基准用例表明,与定制脚本相比,组装多数据集分析队列的时间从数天缩短至数分钟。

讨论

BioBricks.ai加速了数据访问,促进了可重复的工作流程,并降低了整合异构公共数据集的障碍。通过将数据视为版本控制的软件,该平台鼓励社区贡献并减少冗余的工程工作。砖块覆盖范围的持续扩大和自动溯源跟踪将进一步加强生命科学社区的FAIR(可查找、可访问、可互操作、可重用)数据实践。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5801/12380851/19dba8b5f16a/frai-08-1599412-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5801/12380851/c41c5d0de832/frai-08-1599412-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5801/12380851/08438ecfd599/frai-08-1599412-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5801/12380851/19dba8b5f16a/frai-08-1599412-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5801/12380851/c41c5d0de832/frai-08-1599412-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5801/12380851/08438ecfd599/frai-08-1599412-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5801/12380851/19dba8b5f16a/frai-08-1599412-g003.jpg

相似文献

1
BioBricks.ai: a versioned data registry for life sciences data assets.BioBricks.ai:生命科学数据资产的版本化数据注册库。
Front Artif Intell. 2025 Aug 13;8:1599412. doi: 10.3389/frai.2025.1599412. eCollection 2025.
2
BioBricks.ai: A Versioned Data Registry for Life Sciences Data Assets.BioBricks.ai:生命科学数据资产的版本化数据注册库。
ArXiv. 2024 Aug 30:arXiv:2408.17320v1.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
Improving the FAIRness and Sustainability of the NHGRI Resources Ecosystem.提高国家人类基因组研究所资源生态系统的公平性和可持续性。
ArXiv. 2025 Aug 19:arXiv:2508.13498v1.
5
GRAPEVNE - Graphical Analytical Pipeline Development Environment for Infectious Diseases.GRAPEVNE - 传染病图形分析管道开发环境
Wellcome Open Res. 2025 May 27;10:279. doi: 10.12688/wellcomeopenres.23824.1. eCollection 2025.
6
Daily life in the Open Biologist's second job, as a Data Curator.开放生物学家的第二份工作——数据管理员的日常生活。
Wellcome Open Res. 2024 Dec 5;9:523. doi: 10.12688/wellcomeopenres.22899.1. eCollection 2024.
7
A Cloud-Based Platform for Harmonized COVID-19 Data: Design and Implementation of the Rapid Acceleration of Diagnostics (RADx) Data Hub.一个用于统一新冠病毒疾病(COVID-19)数据的基于云的平台:诊断快速加速(RADx)数据中心的设计与实现
JMIR Public Health Surveill. 2025 Aug 20;11:e72677. doi: 10.2196/72677.
8
Ophthalmia Neonatorum新生儿眼炎
9
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.
10
Tuberculosis Prevention, Control, and Elimination结核病的预防、控制与消除

本文引用的文献

1
A review of large language models and autonomous agents in chemistry.化学领域中大型语言模型与自主智能体的综述。
Chem Sci. 2024 Dec 9;16(6):2514-2572. doi: 10.1039/d4sc03921a. eCollection 2025 Feb 5.
2
Artificial Intelligence in Cardiovascular Care-Part 2: Applications: JACC Review Topic of the Week.人工智能在心血管护理中的应用 - 第 2 部分:JACC 每周综述专题。
J Am Coll Cardiol. 2024 Jun 18;83(24):2487-2496. doi: 10.1016/j.jacc.2024.03.401. Epub 2024 Apr 7.
3
Artificial intelligence as the new frontier in chemical risk assessment.
人工智能成为化学风险评估的新前沿。
Front Artif Intell. 2023 Oct 17;6:1269932. doi: 10.3389/frai.2023.1269932. eCollection 2023.
4
re3data - Indexing the Global Research Data Repository Landscape Since 2012.re3data-自 2012 年以来索引全球研究数据知识库领域。
Sci Data. 2023 Aug 29;10(1):571. doi: 10.1038/s41597-023-02462-y.
5
CTD tetramers: a new online tool that computationally links curated chemicals, genes, phenotypes, and diseases to inform molecular mechanisms for environmental health.CTD 四聚体:一个新的在线工具,可从计算上链接经策展的化学品、基因、表型和疾病,为环境健康的分子机制提供信息。
Toxicol Sci. 2023 Sep 28;195(2):155-168. doi: 10.1093/toxsci/kfad069.
6
The scalable precision medicine open knowledge engine (SPOKE): a massive knowledge graph of biomedical information.可扩展的精准医学开放知识引擎 (SPOKE):生物医学信息的大规模知识图谱。
Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad080.
7
Machine Learning and Artificial Intelligence in Toxicological Sciences.机器学习和人工智能在毒理学科学中的应用。
Toxicol Sci. 2022 Aug 25;189(1):7-19. doi: 10.1093/toxsci/kfac075.
8
PharmacoDB 2.0: improving scalability and transparency of in vitro pharmacogenomics analysis.PharmacoDB 2.0:提高体外药物基因组学分析的可扩展性和透明度。
Nucleic Acids Res. 2022 Jan 7;50(D1):D1348-D1357. doi: 10.1093/nar/gkab1084.
9
A Biomedical Knowledge Graph System to Propose Mechanistic Hypotheses for Real-World Environmental Health Observations: Cohort Study and Informatics Application.一个用于为实际环境健康观察提出机制假设的生物医学知识图谱系统:队列研究与信息学应用
JMIR Med Inform. 2021 Jul 20;9(7):e26714. doi: 10.2196/26714.
10
Evaluation of the OECD QSAR toolbox automatic workflow for the prediction of the acute toxicity of organic chemicals to fathead minnow.评价经合组织 QSAR 工具自动工作流程预测有机化学品对黑头呆鱼急性毒性的能力。
Regul Toxicol Pharmacol. 2021 Jun;122:104893. doi: 10.1016/j.yrtph.2021.104893. Epub 2021 Feb 12.