Pennsylvania State University, State College, PA, USA.
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
AMIA Annu Symp Proc. 2023 Apr 29;2022:1135-1144. eCollection 2022.
Scientific reproducibility that effectively leverages existing study data is critical to the advancement of research in many disciplines including neuroscience, which uses imaging and electrophysiology modalities as primary endpoints or key dependency in studies. We are developing an integrated search platform called NeuroBridge to enable researchers to search for relevant study datasets that can be used to test a hypothesis or replicate a published finding without having to perform a difficult search from scratch, including contacting individual study authors and locating the site to download the data. In this paper, we describe the development of a metadata ontology based on the World Wide Web Consortium (W3C) PROV specifications to create a corpus of semantically annotated published papers. This annotated corpus was used in a deep learning model to support automated identification of candidate datasets related to neurocognitive assessment of subjects with drug abuse or schizophrenia using neuroimaging. We built on our previous work in the Provenance for Clinical and Health Research (ProvCaRe) project to model metadata information in the NeuroBridge ontology and used this ontology to annotate 51 articles using a Web-based tool called Inception. The Bidirectional Encoder Representations from Transformers (BERT) neural network model, which was trained using the annotated corpus, is used to classify and rank papers relevant to five research hypotheses and the results were evaluated independently by three users for accuracy and recall. Our combined use of the NeuroBridge ontology together with the deep learning model outperforms the existing PubMed Central (PMC) search engine and manifests considerable trainability and transparency compared with typical free-text search. An initial version of the NeuroBridge portal is available at: https://neurobridges.org/.
科学可重复性有效地利用现有研究数据对于许多学科的研究进展至关重要,包括神经科学,它将成像和电生理学模式作为主要终点或研究中的关键依赖项。我们正在开发一个名为 NeuroBridge 的集成搜索平台,使研究人员能够搜索相关的研究数据集,这些数据集可用于测试假设或复制已发表的发现,而无需从头开始进行困难的搜索,包括联系个别研究作者和找到下载数据的位置。在本文中,我们描述了基于万维网联盟 (W3C) PROV 规范开发基于元数据本体的方法,以创建一个语义注释发表论文的语料库。该注释语料库用于深度学习模型中,以支持使用神经影像学对滥用药物或精神分裂症患者进行神经认知评估的候选数据集的自动识别。我们在之前的 Provenance for Clinical and Health Research (ProvCaRe) 项目中进行了扩展,以在 NeuroBridge 本体中对元数据信息进行建模,并使用名为 Inception 的基于 Web 的工具对 51 篇文章进行注释。使用标注语料库训练的 Bidirectional Encoder Representations from Transformers (BERT) 神经网络模型用于对与五个研究假设相关的论文进行分类和排名,结果由三位用户独立评估准确性和召回率。我们联合使用 NeuroBridge 本体和深度学习模型的方法优于现有的 PubMed Central (PMC) 搜索引擎,与典型的自由文本搜索相比,具有相当的可训练性和透明度。NeuroBridge 门户的初始版本可在以下网址获得:https://neurobridges.org/。