McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.
Department of Neurology, McGovern School of Medicine, The University of Texas Health Science Center at Houston, Houston, TX, USA.
BMC Med Inform Decis Mak. 2023 Aug 4;23(Suppl 1):151. doi: 10.1186/s12911-023-02250-z.
In the United States, the National Alzheimer's Coordinating Center (NACC) and the Alzheimer's Disease Neuroimaging Initiative (ADNI) are two major data sharing resources for Alzheimer's Disease (AD) research. NACC and ADNI strive to make their data more FAIR (findable, interoperable, accessible and reusable) for the broader research community. However, there is limited work harmonizing and supporting cross-cohort interoperability of the two resources.
In this paper, we leverage an ontology-based approach to harmonize data elements in the two resources and develop a web-based query system to search patient cohorts across the two resources. We first mapped data elements across NACC and ADNI, and performed value harmonization for the mapped data elements with inconsistent permissible values. Then we built an Alzheimer's Disease Data Element Ontology (ADEO) to model the mapped data elements in NACC and ADNI. We further developed a prototype cross-cohort query system to search patient cohorts across NACC and ADNI.
After manual review, we found 172 mappings between NACC and ADNI. These 172 mappings were further used to construct common concepts in ADEO. Our data element mapping and harmonization resulted in five files storing common concepts, variables in NACC and ADNI, mappings between variables and common concepts, permissible values of categorical type data elements, and coding inconsistency harmonization, respectively. Our cross-cohort query system consists of three core architectural elements: a web-based interface, an advanced query engine, and a backend MongoDB database.
In this work, ADEO has been specifically designed to facilitate data harmonization and cross-cohort query of NACC and ADNI data resources. Although our prototype cross-cohort query system was developed for exploring NACC and ADNI, its backend and frontend framework has been designed and implemented to be generally applicable to other domains for querying patient cohorts from multiple heterogeneous data sources.
在美国,国家阿尔茨海默病协调中心(NACC)和阿尔茨海默病神经影像学倡议(ADNI)是阿尔茨海默病(AD)研究的两个主要数据共享资源。NACC 和 ADNI 努力使他们的数据对更广泛的研究社区更具有 FAIR(可查找、可互操作、可访问和可重用)特性。然而,协调和支持这两个资源的跨队列互操作性的工作有限。
在本文中,我们利用基于本体的方法来协调两个资源中的数据元素,并开发了一个基于网络的查询系统,以搜索两个资源中的患者队列。我们首先跨 NACC 和 ADNI 映射数据元素,并对映射数据元素进行值协调,这些数据元素的可允许值不一致。然后,我们构建了一个阿尔茨海默病数据元素本体(ADEO)来对 NACC 和 ADNI 中的映射数据元素进行建模。我们进一步开发了一个原型跨队列查询系统,以搜索 NACC 和 ADNI 中的患者队列。
在手动审查后,我们在 NACC 和 ADNI 之间发现了 172 个映射。这些 172 个映射进一步用于构建 ADEO 中的公共概念。我们的数据元素映射和协调导致五个文件存储了公共概念、NACC 和 ADNI 中的变量、变量与公共概念之间的映射、分类类型数据元素的可允许值和编码不一致协调。我们的跨队列查询系统由三个核心架构元素组成:一个基于网络的接口、一个高级查询引擎和一个后端 MongoDB 数据库。
在这项工作中,ADEO 是专门设计的,以促进 NACC 和 ADNI 数据资源的数据协调和跨队列查询。虽然我们的原型跨队列查询系统是为探索 NACC 和 ADNI 而开发的,但它的后端和前端框架已被设计和实现为可普遍适用于其他领域,用于从多个异构数据源查询患者队列。