通过 FAIR 数据管理实现科学可重复性：NeuroBridge 项目中的基于本体的深度学习方法。

Enabling Scientific Reproducibility through FAIR Data Management: An ontology-driven deep learning approach in the NeuroBridge Project.

机构信息

Pennsylvania State University, State College, PA, USA.

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.

出版信息

AMIA Annu Symp Proc. 2023 Apr 29;2022:1135-1144. eCollection 2022.

PMID:37128458

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10148274/

Abstract

Scientific reproducibility that effectively leverages existing study data is critical to the advancement of research in many disciplines including neuroscience, which uses imaging and electrophysiology modalities as primary endpoints or key dependency in studies. We are developing an integrated search platform called NeuroBridge to enable researchers to search for relevant study datasets that can be used to test a hypothesis or replicate a published finding without having to perform a difficult search from scratch, including contacting individual study authors and locating the site to download the data. In this paper, we describe the development of a metadata ontology based on the World Wide Web Consortium (W3C) PROV specifications to create a corpus of semantically annotated published papers. This annotated corpus was used in a deep learning model to support automated identification of candidate datasets related to neurocognitive assessment of subjects with drug abuse or schizophrenia using neuroimaging. We built on our previous work in the Provenance for Clinical and Health Research (ProvCaRe) project to model metadata information in the NeuroBridge ontology and used this ontology to annotate 51 articles using a Web-based tool called Inception. The Bidirectional Encoder Representations from Transformers (BERT) neural network model, which was trained using the annotated corpus, is used to classify and rank papers relevant to five research hypotheses and the results were evaluated independently by three users for accuracy and recall. Our combined use of the NeuroBridge ontology together with the deep learning model outperforms the existing PubMed Central (PMC) search engine and manifests considerable trainability and transparency compared with typical free-text search. An initial version of the NeuroBridge portal is available at: https://neurobridges.org/.

摘要

科学可重复性有效地利用现有研究数据对于许多学科的研究进展至关重要，包括神经科学，它将成像和电生理学模式作为主要终点或研究中的关键依赖项。我们正在开发一个名为 NeuroBridge 的集成搜索平台，使研究人员能够搜索相关的研究数据集，这些数据集可用于测试假设或复制已发表的发现，而无需从头开始进行困难的搜索，包括联系个别研究作者和找到下载数据的位置。在本文中，我们描述了基于万维网联盟 (W3C) PROV 规范开发基于元数据本体的方法，以创建一个语义注释发表论文的语料库。该注释语料库用于深度学习模型中，以支持使用神经影像学对滥用药物或精神分裂症患者进行神经认知评估的候选数据集的自动识别。我们在之前的 Provenance for Clinical and Health Research (ProvCaRe) 项目中进行了扩展，以在 NeuroBridge 本体中对元数据信息进行建模，并使用名为 Inception 的基于 Web 的工具对 51 篇文章进行注释。使用标注语料库训练的 Bidirectional Encoder Representations from Transformers (BERT) 神经网络模型用于对与五个研究假设相关的论文进行分类和排名，结果由三位用户独立评估准确性和召回率。我们联合使用 NeuroBridge 本体和深度学习模型的方法优于现有的 PubMed Central (PMC) 搜索引擎，与典型的自由文本搜索相比，具有相当的可训练性和透明度。NeuroBridge 门户的初始版本可在以下网址获得：https://neurobridges.org/。

相似文献

Enabling Scientific Reproducibility through FAIR Data Management: An ontology-driven deep learning approach in the NeuroBridge Project.通过 FAIR 数据管理实现科学可重复性：NeuroBridge 项目中的基于本体的深度学习方法。

AMIA Annu Symp Proc. 2023 Apr 29;2022:1135-1144. eCollection 2022.

NeuroBridge ontology: computable provenance metadata to give the long tail of neuroimaging data a FAIR chance for secondary use.神经桥本体：可计算的溯源元数据，为神经影像数据的长尾提供二次使用的公平机会。

Front Neuroinform. 2023 Jul 24;17:1216443. doi: 10.3389/fninf.2023.1216443. eCollection 2023.

NeuroBridge: a prototype platform for discovery of the long-tail neuroimaging data.NeuroBridge：一个用于发现长尾神经影像数据的原型平台。

Front Neuroinform. 2023 Aug 31;17:1215261. doi: 10.3389/fninf.2023.1215261. eCollection 2023.

ProvCaRe: Characterizing scientific reproducibility of biomedical research studies using semantic provenance metadata.ProvCaRe：使用语义来源元数据刻画生物医学研究的科学可重复性。

Int J Med Inform. 2019 Jan;121:10-18. doi: 10.1016/j.ijmedinf.2018.10.009. Epub 2018 Nov 3.

Scientific Reproducibility in Biomedical Research: Provenance Metadata Ontology for Semantic Annotation of Study Description.生物医学研究中的科学可重复性：用于研究描述语义注释的来源元数据本体论

AMIA Annu Symp Proc. 2017 Feb 10;2016:1070-1079. eCollection 2016.

A semantic proteomics dashboard (SemPoD) for data management in translational research.用于转化研究数据管理的语义蛋白质组学仪表板（SemPoD）。

BMC Syst Biol. 2012;6 Suppl 3(Suppl 3):S20. doi: 10.1186/1752-0509-6-S3-S20. Epub 2012 Dec 17.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text (Short Paper).一种用于从生物医学文本中提取溯源元数据的启用本体的自然语言处理管道（短文）。

On Move Meaningful Internet Syst. 2016 Oct;10033:699-708. doi: 10.1007/978-3-319-48472-3_43. Epub 2016 Oct 18.

ProvCaRe Semantic Provenance Knowledgebase: Evaluating Scientific Reproducibility of Research Studies.ProvCaRe语义溯源知识库：评估研究的科学可重复性。

AMIA Annu Symp Proc. 2018 Apr 16;2017:1705-1714. eCollection 2017.

Semantic Provenance Graph for Reproducibility of Biomedical Research Studies: Generating and Analyzing Graph Structures from Published Literature.用于生物医学研究可重复性的语义溯源图：从已发表文献中生成和分析图结构。

Stud Health Technol Inform. 2019 Aug 21;264:328-332. doi: 10.3233/SHTI190237.

引用本文的文献

Large language models can extract metadata for annotation of human neuroimaging publications.大型语言模型可以提取元数据，用于注释人类神经影像学术出版物。

Front Neuroinform. 2025 Aug 20;19:1609077. doi: 10.3389/fninf.2025.1609077. eCollection 2025.

Large Language Models Can Extract Metadata for Annotation of Human Neuroimaging Publications.大语言模型可以提取元数据用于人类神经影像出版物的注释。

bioRxiv. 2025 May 14:2025.05.13.653828. doi: 10.1101/2025.05.13.653828.

Provenance Information for Biomedical Data and Workflows: Scoping Review.生物医学数据和工作流程的出处信息：范围综述。

J Med Internet Res. 2024 Aug 23;26:e51297. doi: 10.2196/51297.

NeuroBridge: a prototype platform for discovery of the long-tail neuroimaging data.NeuroBridge：一个用于发现长尾神经影像数据的原型平台。

Front Neuroinform. 2023 Aug 31;17:1215261. doi: 10.3389/fninf.2023.1215261. eCollection 2023.

Front Neuroinform. 2023 Jul 24;17:1216443. doi: 10.3389/fninf.2023.1216443. eCollection 2023.

Ontology-based feature engineering in machine learning workflows for heterogeneous epilepsy patient records.基于本体论的机器学习工作流中的特征工程，用于异构的癫痫患者记录。

Sci Rep. 2022 Nov 12;12(1):19430. doi: 10.1038/s41598-022-23101-3.

本文引用的文献

Stud Health Technol Inform. 2019 Aug 21;264:328-332. doi: 10.3233/SHTI190237.

PMC text mining subset in BioC: about three million full-text articles and growing.PMC 文本挖掘子集在 BioC 中：约三百万篇全文文章且还在不断增加。

Bioinformatics. 2019 Sep 15;35(18):3533-3535. doi: 10.1093/bioinformatics/btz070.

Int J Med Inform. 2019 Jan;121:10-18. doi: 10.1016/j.ijmedinf.2018.10.009. Epub 2018 Nov 3.

Abnormal degree centrality in chronic users of codeine-containing cough syrups: A resting-state functional magnetic resonance imaging study.含可待因止咳糖浆的慢性使用者的异常度中心度：一项静息态功能磁共振成像研究。

Neuroimage Clin. 2018 Jun 5;19:775-781. doi: 10.1016/j.nicl.2018.06.003. eCollection 2018.

Scanning the horizon: towards transparent and reproducible neuroimaging research.审视前沿：迈向透明且可重复的神经影像学研究。

Nat Rev Neurosci. 2017 Feb;18(2):115-126. doi: 10.1038/nrn.2016.167. Epub 2017 Jan 5.

The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments.脑影像数据结构，一种组织和描述神经影像实验结果的格式。

Sci Data. 2016 Jun 21;3:160044. doi: 10.1038/sdata.2016.44.

The FAIR Guiding Principles for scientific data management and stewardship.科学数据管理和保存的 FAIR 指导原则。

Sci Data. 2016 Mar 15;3:160018. doi: 10.1038/sdata.2016.18.

SchizConnect: Virtual Data Integration in Neuroimaging.SchizConnect：神经影像学中的虚拟数据整合

Data Integr Life Sci. 2015 Jul;9162:37-51. doi: 10.1007/978-3-319-21843-4_4. Epub 2015 Jul 8.

SchizConnect: Mediating neuroimaging databases on schizophrenia and related disorders for large-scale integration.SchizConnect：用于大规模整合的精神分裂症及相关疾病神经影像学数据库中介平台。

Neuroimage. 2016 Jan 1;124(Pt B):1155-1167. doi: 10.1016/j.neuroimage.2015.06.065. Epub 2015 Jun 30.

SCIENTIFIC STANDARDS. Promoting an open research culture.科学标准。促进开放的研究文化。

Science. 2015 Jun 26;348(6242):1422-5. doi: 10.1126/science.aab2374.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验