• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物医学研究中的数据溯源:范围综述。

Data Provenance in Biomedical Research: Scoping Review.

机构信息

Medical Informatics Group, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany.

Institute of Medical Informatics, Charité - Universitätsmedizin Berlin, Berlin, Germany.

出版信息

J Med Internet Res. 2023 Mar 27;25:e42289. doi: 10.2196/42289.

DOI:10.2196/42289
PMID:36972116
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10132013/
Abstract

BACKGROUND

Data provenance refers to the origin, processing, and movement of data. Reliable and precise knowledge about data provenance has great potential to improve reproducibility as well as quality in biomedical research and, therefore, to foster good scientific practice. However, despite the increasing interest on data provenance technologies in the literature and their implementation in other disciplines, these technologies have not yet been widely adopted in biomedical research.

OBJECTIVE

The aim of this scoping review was to provide a structured overview of the body of knowledge on provenance methods in biomedical research by systematizing articles covering data provenance technologies developed for or used in this application area; describing and comparing the functionalities as well as the design of the provenance technologies used; and identifying gaps in the literature, which could provide opportunities for future research on technologies that could receive more widespread adoption.

METHODS

Following a methodological framework for scoping studies and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines, articles were identified by searching the PubMed, IEEE Xplore, and Web of Science databases and subsequently screened for eligibility. We included original articles covering software-based provenance management for scientific research published between 2010 and 2021. A set of data items was defined along the following five axes: publication metadata, application scope, provenance aspects covered, data representation, and functionalities. The data items were extracted from the articles, stored in a charting spreadsheet, and summarized in tables and figures.

RESULTS

We identified 44 original articles published between 2010 and 2021. We found that the solutions described were heterogeneous along all axes. We also identified relationships among motivations for the use of provenance information, feature sets (capture, storage, retrieval, visualization, and analysis), and implementation details such as the data models and technologies used. The important gap that we identified is that only a few publications address the analysis of provenance data or use established provenance standards, such as PROV.

CONCLUSIONS

The heterogeneity of provenance methods, models, and implementations found in the literature points to the lack of a unified understanding of provenance concepts for biomedical data. Providing a common framework, a biomedical reference, and benchmarking data sets could foster the development of more comprehensive provenance solutions.

摘要

背景

数据溯源是指数据的来源、处理和移动。可靠和准确的数据溯源知识对于提高生物医学研究的可重复性和质量具有巨大潜力,从而促进良好的科学实践。然而,尽管数据溯源技术在文献中越来越受到关注,并在其他学科中得到实施,但这些技术尚未在生物医学研究中得到广泛采用。

目的

本范围综述的目的是通过系统地组织涵盖为该应用领域开发或使用的数据溯源技术的文章,提供生物医学研究中溯源方法的知识体系的结构化概述;描述和比较所使用的溯源技术的功能和设计;并确定文献中的差距,这为未来可能更广泛采用的技术的研究提供了机会。

方法

根据范围研究的方法论框架和 PRISMA-ScR(系统评价和荟萃分析扩展的首选报告项目用于范围综述)指南,通过搜索 PubMed、IEEE Xplore 和 Web of Science 数据库来确定文章,并随后对其进行筛选以确定其是否符合条件。我们纳入了 2010 年至 2021 年间发表的涵盖科学研究中基于软件的溯源管理的原创文章。沿着以下五个轴定义了一组数据项:出版元数据、应用范围、涵盖的溯源方面、数据表示和功能。从文章中提取数据项,存储在图表电子表格中,并在表格和图表中进行总结。

结果

我们确定了 2010 年至 2021 年间发表的 44 篇原创文章。我们发现,所描述的解决方案在所有轴上都是异构的。我们还发现了使用溯源信息的动机、功能集(捕获、存储、检索、可视化和分析)以及所使用的数据模型和技术等实施细节之间的关系。我们发现的一个重要差距是,只有少数出版物涉及对溯源数据的分析或使用 PROV 等既定的溯源标准。

结论

文献中发现的溯源方法、模型和实现的异质性表明,对于生物医学数据的溯源概念缺乏统一的理解。提供一个通用框架、一个生物医学参考和基准数据集可以促进更全面的溯源解决方案的发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37ec/10132013/b693d364d378/jmir_v25i1e42289_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37ec/10132013/20b149731513/jmir_v25i1e42289_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37ec/10132013/f768f131fcb7/jmir_v25i1e42289_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37ec/10132013/fba3e7a154c3/jmir_v25i1e42289_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37ec/10132013/4dfd8d68c1e0/jmir_v25i1e42289_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37ec/10132013/a53931ccb0a7/jmir_v25i1e42289_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37ec/10132013/b693d364d378/jmir_v25i1e42289_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37ec/10132013/20b149731513/jmir_v25i1e42289_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37ec/10132013/f768f131fcb7/jmir_v25i1e42289_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37ec/10132013/fba3e7a154c3/jmir_v25i1e42289_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37ec/10132013/4dfd8d68c1e0/jmir_v25i1e42289_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37ec/10132013/a53931ccb0a7/jmir_v25i1e42289_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37ec/10132013/b693d364d378/jmir_v25i1e42289_fig6.jpg

相似文献

1
Data Provenance in Biomedical Research: Scoping Review.生物医学研究中的数据溯源:范围综述。
J Med Internet Res. 2023 Mar 27;25:e42289. doi: 10.2196/42289.
2
Approaches and Criteria for Provenance in Biomedical Data Sets and Workflows: Protocol for a Scoping Review.生物医学数据集和工作流程中出处的方法与标准:范围综述方案
JMIR Res Protoc. 2021 Nov 22;10(11):e31750. doi: 10.2196/31750.
3
Provenance Information for Biomedical Data and Workflows: Scoping Review.生物医学数据和工作流程的出处信息:范围综述。
J Med Internet Res. 2024 Aug 23;26:e51297. doi: 10.2196/51297.
4
Ethics of Procuring and Using Organs or Tissue from Infants and Newborns for Transplantation, Research, or Commercial Purposes: Protocol for a Bioethics Scoping Review.从婴儿和新生儿获取器官或组织用于移植、研究或商业目的的伦理问题:生物伦理学范围审查方案
Wellcome Open Res. 2024 Dec 5;9:717. doi: 10.12688/wellcomeopenres.23235.1. eCollection 2024.
5
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
6
Scientific Reproducibility in Biomedical Research: Provenance Metadata Ontology for Semantic Annotation of Study Description.生物医学研究中的科学可重复性:用于研究描述语义注释的来源元数据本体论
AMIA Annu Symp Proc. 2017 Feb 10;2016:1070-1079. eCollection 2016.
7
ProvCaRe: Characterizing scientific reproducibility of biomedical research studies using semantic provenance metadata.ProvCaRe:使用语义来源元数据刻画生物医学研究的科学可重复性。
Int J Med Inform. 2019 Jan;121:10-18. doi: 10.1016/j.ijmedinf.2018.10.009. Epub 2018 Nov 3.
8
Artificial intelligence technologies and compassion in healthcare: A systematic scoping review.医疗保健中的人工智能技术与人文关怀:一项系统综述。
Front Psychol. 2023 Jan 17;13:971044. doi: 10.3389/fpsyg.2022.971044. eCollection 2022.
9
Semantic Provenance Graph for Reproducibility of Biomedical Research Studies: Generating and Analyzing Graph Structures from Published Literature.用于生物医学研究可重复性的语义溯源图:从已发表文献中生成和分析图结构。
Stud Health Technol Inform. 2019 Aug 21;264:328-332. doi: 10.3233/SHTI190237.
10
Health Care Social Robots in the Age of Generative AI: Protocol for a Scoping Review.生成式人工智能时代的医疗保健社交机器人:一项范围综述的方案
JMIR Res Protoc. 2025 Apr 14;14:e63017. doi: 10.2196/63017.

引用本文的文献

1
Bridging the Scientific Knowledge Gap and Reproducibility: A Survey of Provenance, Assertion and Evidence Ontologies.弥合科学知识差距与可重复性:溯源、断言和证据本体的调查
Proc Int World Wide Web Conf. 2025 Apr-May;2025(Companion):924-928. doi: 10.1145/3701716.3715483. Epub 2025 May 23.
2
Synthetic Data in Healthcare and Drug Development: Definitions, Regulatory Frameworks, Issues.医疗保健与药物研发中的合成数据:定义、监管框架、问题
CPT Pharmacometrics Syst Pharmacol. 2025 May;14(5):840-852. doi: 10.1002/psp4.70021. Epub 2025 Apr 7.
3
How the National Library of Medicine should evolve in an era of artificial intelligence.

本文引用的文献

1
Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities.挖掘医院电子健康记录数据宝藏:挑战与机遇
JMIR Med Inform. 2022 Oct 21;10(10):e38557. doi: 10.2196/38557.
2
Identifying Data Quality Dimensions for Person-Generated Wearable Device Data: Multi-Method Study.确定个人生成的可穿戴设备数据的数据质量维度:多方法研究。
JMIR Mhealth Uhealth. 2021 Dec 23;9(12):e31618. doi: 10.2196/31618.
3
Trellis for efficient data and task management in the VA Million Veteran Program.VA 百万老兵计划中的高效数据和任务管理的格架。
国立医学图书馆在人工智能时代应如何发展。
J Am Med Inform Assoc. 2025 May 1;32(5):968-970. doi: 10.1093/jamia/ocaf041.
4
Data stewardship and curation practices in AI-based genomics and automated microscopy image analysis for high-throughput screening studies: promoting robust and ethical AI applications.基于人工智能的基因组学和用于高通量筛选研究的自动显微镜图像分析中的数据管理与整理实践:推动可靠且符合伦理的人工智能应用。
Hum Genomics. 2025 Feb 23;19(1):16. doi: 10.1186/s40246-025-00716-x.
5
Provenance Information for Biomedical Data and Workflows: Scoping Review.生物医学数据和工作流程的出处信息:范围综述。
J Med Internet Res. 2024 Aug 23;26:e51297. doi: 10.2196/51297.
6
Best practices for data management and sharing in experimental biomedical research.实验生物医学研究中数据管理和共享的最佳实践。
Physiol Rev. 2024 Jul 1;104(3):1387-1408. doi: 10.1152/physrev.00043.2023. Epub 2024 Mar 7.
7
The Status of Data Management Practices Across German Medical Data Integration Centers: Mixed Methods Study.德国医学数据集成中心的数据管理实践现状:混合方法研究。
J Med Internet Res. 2023 Nov 8;25:e48809. doi: 10.2196/48809.
Sci Rep. 2021 Dec 1;11(1):23229. doi: 10.1038/s41598-021-02569-5.
4
Approaches and Criteria for Provenance in Biomedical Data Sets and Workflows: Protocol for a Scoping Review.生物医学数据集和工作流程中出处的方法与标准:范围综述方案
JMIR Res Protoc. 2021 Nov 22;10(11):e31750. doi: 10.2196/31750.
5
Adjusting for selection bias due to missing data in electronic health records-based research.调整电子健康记录研究中因数据缺失导致的选择偏差。
Stat Methods Med Res. 2021 Oct;30(10):2221-2238. doi: 10.1177/09622802211027601. Epub 2021 Aug 26.
6
Secure and Provenance Enhanced Internet of Health Things Framework: A Blockchain Managed Federated Learning Approach.安全与溯源增强的健康物联网框架:一种区块链管理的联邦学习方法。
IEEE Access. 2020 Nov 11;8:205071-205087. doi: 10.1109/ACCESS.2020.3037474. eCollection 2020.
7
Smart Decentralization of Personal Health Records with Physician Apps and Helper Agents on Blockchain: Platform Design and Implementation Study.利用区块链上的医生应用程序和辅助代理实现个人健康记录的智能去中心化:平台设计与实施研究
JMIR Med Inform. 2021 Jun 7;9(6):e26230. doi: 10.2196/26230.
8
The PRISMA 2020 statement: an updated guideline for reporting systematic reviews.PRISMA 2020 声明:系统评价报告的更新指南。
BMJ. 2021 Mar 29;372:n71. doi: 10.1136/bmj.n71.
9
Transparency and reproducibility in artificial intelligence.人工智能中的透明度和可重复性。
Nature. 2020 Oct;586(7829):E14-E16. doi: 10.1038/s41586-020-2766-y. Epub 2020 Oct 14.
10
Enhancing Traceability in Clinical Research Data through a Metadata Framework.通过元数据框架增强临床研究数据的可追溯性。
Methods Inf Med. 2020 May;59(2-03):75-85. doi: 10.1055/s-0040-1714393. Epub 2020 Sep 7.