Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, TUD Dresden University of Technology, Fetscherstraße 74, 01307, Dresden, Germany.
Department of Internal Medicine I, University Hospital Frankfurt, Goethe University, Frankfurt, Germany.
Orphanet J Rare Dis. 2024 Aug 14;19(1):298. doi: 10.1186/s13023-024-03312-9.
Given the geographical sparsity of Rare Diseases (RDs), assembling a cohort is often a challenging task. Common data models (CDM) can harmonize disparate sources of data that can be the basis of decision support systems and artificial intelligence-based studies, leading to new insights in the field. This work is sought to support the design of large-scale multi-center studies for rare diseases.
In an interdisciplinary group, we derived a list of elements of RDs in three medical domains (endocrinology, gastroenterology, and pneumonology) according to specialist knowledge and clinical guidelines in an iterative process. We then defined a RDs data structure that matched all our data elements and built Extract, Transform, Load (ETL) processes to transfer the structure to a joint CDM. To ensure interoperability of our developed CDM and its subsequent usage for further RDs domains, we ultimately mapped it to Observational Medical Outcomes Partnership (OMOP) CDM. We then included a fourth domain, hematology, as a proof-of-concept and mapped an acute myeloid leukemia (AML) dataset to the developed CDM.
We have developed an OMOP-based rare diseases common data model (RD-CDM) using data elements from the three domains (endocrinology, gastroenterology, and pneumonology) and tested the CDM using data from the hematology domain. The total study cohort included 61,697 patients. After aligning our modules with those of Medical Informatics Initiative (MII) Core Dataset (CDS) modules, we leveraged its ETL process. This facilitated the seamless transfer of demographic information, diagnoses, procedures, laboratory results, and medication modules from our RD-CDM to the OMOP. For the phenotypes and genotypes, we developed a second ETL process. We finally derived lessons learned for customizing our RD-CDM for different RDs.
This work can serve as a blueprint for other domains as its modularized structure could be extended towards novel data types. An interdisciplinary group of stakeholders that are actively supporting the project's progress is necessary to reach a comprehensive CDM.
The customized data structure related to our RD-CDM can be used to perform multi-center studies to test data-driven hypotheses on a larger scale and take advantage of the analytical tools offered by the OHDSI community.
由于罕见病(RDs)的地理分布稀疏,因此组建队列通常是一项具有挑战性的任务。通用数据模型(CDM)可以协调来自不同来源的数据,这些数据可以成为决策支持系统和基于人工智能的研究的基础,从而为该领域提供新的见解。这项工作旨在支持罕见病的大规模多中心研究的设计。
在一个跨学科的小组中,我们根据专家知识和临床指南,在三个医学领域(内分泌学、胃肠病学和肺病学)中对罕见病的元素进行了迭代式的列表推导。然后,我们定义了一个与所有数据元素匹配的罕见病数据结构,并构建了提取、转换、加载(ETL)流程,以将结构传输到联合的 CDM。为了确保我们开发的 CDM 的互操作性及其随后在其他罕见病领域的使用,我们最终将其映射到观察医学结局伙伴关系(OMOP)CDM。然后,我们纳入了第四个领域,血液学,作为一个概念验证,并将急性髓系白血病(AML)数据集映射到开发的 CDM。
我们使用来自三个领域(内分泌学、胃肠病学和肺病学)的数据元素开发了一个基于 OMOP 的罕见病通用数据模型(RD-CDM),并使用血液学领域的数据对该 CDM 进行了测试。总研究队列包括 61697 名患者。在将我们的模块与医学信息学倡议(MII)核心数据集(CDS)模块对齐后,我们利用了它的 ETL 流程。这使得从我们的 RD-CDM 到 OMOP 的人口统计学信息、诊断、程序、实验室结果和药物模块的无缝传输成为可能。对于表型和基因型,我们开发了第二个 ETL 流程。最后,我们为针对不同罕见病定制我们的 RD-CDM 总结了经验教训。
由于其模块化结构可以扩展到新的数据类型,因此这项工作可以作为其他领域的蓝图。需要一个跨学科的利益相关者小组来积极支持项目的进展,以实现全面的 CDM。
与我们的 RD-CDM 相关的定制数据结构可用于进行多中心研究,以便在更大规模上基于数据驱动的假设进行测试,并利用 OHDSI 社区提供的分析工具。