Lichtner Gregor, Haese Thomas, Brose Sally, Röhrig Larissa, Lysyakova Liudmila, Rudolph Stefanie, Uebe Maria, Sass Julian, Bartschke Alexander, Hillus David, Kurth Florian, Sander Leif Erik, Eckart Falk, Toepfner Nicole, Berner Reinhard, Frey Anna, Dörr Marcus, Vehreschild Jörg Janne, von Kalle Christof, Thun Sylvia
Core Facility Digital Medicine and Interoperability, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany.
Institute of Medical Informatics, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany.
JMIR Med Inform. 2023 Jul 18;11:e45496. doi: 10.2196/45496.
The COVID-19 pandemic has spurred large-scale, interinstitutional research efforts. To enable these efforts, researchers must agree on data set definitions that not only cover all elements relevant to the respective medical specialty but also are syntactically and semantically interoperable. Therefore, the German Corona Consensus (GECCO) data set was developed as a harmonized, interoperable collection of the most relevant data elements for COVID-19-related patient research. As the GECCO data set is a compact core data set comprising data across all medical fields, the focused research within particular medical domains demands the definition of extension modules that include data elements that are the most relevant to the research performed in those individual medical specialties. We aimed to (1) specify a workflow for the development of interoperable data set definitions that involves close collaboration between medical experts and information scientists and (2) apply the workflow to develop data set definitions that include data elements that are the most relevant to COVID-19-related patient research regarding immunization, pediatrics, and cardiology. We developed a workflow to create data set definitions that were (1) content-wise as relevant as possible to a specific field of study and (2) universally usable across computer systems, institutions, and countries (ie, interoperable). We then gathered medical experts from 3 specialties-infectious diseases (with a focus on immunization), pediatrics, and cardiology-to select data elements that were the most relevant to COVID-19-related patient research in the respective specialty. We mapped the data elements to international standardized vocabularies and created data exchange specifications, using Health Level Seven International (HL7) Fast Healthcare Interoperability Resources (FHIR). All steps were performed in close interdisciplinary collaboration with medical domain experts and medical information specialists. Profiles and vocabulary mappings were syntactically and semantically validated in a 2-stage process. We created GECCO extension modules for the immunization, pediatrics, and cardiology domains according to pandemic-related requests. The data elements included in each module were selected, according to the developed consensus-based workflow, by medical experts from these specialties to ensure that the contents aligned with their research needs. We defined data set specifications for 48 immunization, 150 pediatrics, and 52 cardiology data elements that complement the GECCO core data set. We created and published implementation guides, example implementations, and data set annotations for each extension module. The GECCO extension modules, which contain data elements that are the most relevant to COVID-19-related patient research on infectious diseases (with a focus on immunization), pediatrics, and cardiology, were defined in an interdisciplinary, iterative, consensus-based workflow that may serve as a blueprint for developing further data set definitions. The GECCO extension modules provide standardized and harmonized definitions of specialty-related data sets that can help enable interinstitutional and cross-country COVID-19 research in these specialties.
新冠疫情激发了大规模的跨机构研究工作。为推动这些工作,研究人员必须就数据集定义达成一致,这些定义不仅要涵盖与各自医学专业相关的所有要素,还要在句法和语义上具有互操作性。因此,德国新冠共识(GECCO)数据集被开发出来,作为一个统一、可互操作的与新冠相关患者研究的最相关数据元素集合。由于GECCO数据集是一个紧凑的核心数据集,包含所有医学领域的数据,特定医学领域内的重点研究需要定义扩展模块,其中包括与那些个别医学专业中所进行研究最相关的数据元素。我们旨在:(1)指定一个开发可互操作数据集定义的工作流程,该流程涉及医学专家和信息科学家之间的密切合作;(2)应用该工作流程来开发数据集定义,其中包括与新冠相关患者免疫、儿科和心脏病学研究最相关的数据元素。我们开发了一个工作流程来创建数据集定义,这些定义:(1)在内容上尽可能与特定研究领域相关;(2)在计算机系统、机构和国家之间普遍可用(即具有互操作性)。然后,我们召集了来自3个专业领域(传染病(重点是免疫)、儿科和心脏病学)的医学专家,以选择与各自专业中新冠相关患者研究最相关的数据元素。我们将这些数据元素映射到国际标准化词汇表,并使用国际卫生信息标准化第七层(HL7)快速医疗互操作性资源(FHIR)创建数据交换规范。所有步骤均在与医学领域专家和医学信息专家的密切跨学科合作中进行。配置文件和词汇映射在一个两阶段过程中进行了句法和语义验证。我们根据与疫情相关的要求,为免疫、儿科和心脏病学领域创建了GECCO扩展模块。每个模块中包含的数据元素是由这些专业的医学专家根据所开发的基于共识的工作流程选择的,以确保内容符合他们的研究需求。我们为48个免疫、150个儿科和52个心脏病学数据元素定义了数据集规范,这些规范对GECCO核心数据集起到补充作用。我们为每个扩展模块创建并发布了实施指南、示例实施和数据集注释。GECCO扩展模块包含与传染病(重点是免疫)、儿科和心脏病学方面新冠相关患者研究最相关的数据元素,其定义采用了跨学科、迭代、基于共识的工作流程,这可能为进一步开发数据集定义提供蓝图。GECCO扩展模块提供了与专业相关的数据集的标准化和统一定义,有助于推动这些专业领域内的跨机构和跨国新冠研究。