Firnkorn D, Ganzinger M, Muley T, Thomas M, Knaup P
Daniel Firnkorn, Heidelberg University, Institute of Medical Biometry and Informatics, Im Neuenheimer Feld 305, 69120 Heidelberg, Germany, E-mail:
Methods Inf Med. 2015;54(5):455-60. doi: 10.3414/ME14-02-0030. Epub 2015 Sep 17.
Joint data analysis is a key requirement in medical research networks. Data are available in heterogeneous formats at each network partner and their harmonization is often rather complex. The objective of our paper is to provide a generic approach for the harmonization process in research networks. We applied the process when harmonizing data from three sites for the Lung Cancer Phenotype Database within the German Center for Lung Research.
We developed a spreadsheet-based solution as tool to support the harmonization process for lung cancer data and a data integration procedure based on Talend Open Studio.
The harmonization process consists of eight steps describing a systematic approach for defining and reviewing source data elements and standardizing common data elements. The steps for defining common data elements and harmonizing them with local data definitions are repeated until consensus is reached. Application of this process for building the phenotype database led to a common basic data set on lung cancer with 285 structured parameters. The Lung Cancer Phenotype Database was realized as an i2b2 research data warehouse.
Data harmonization is a challenging task requiring informatics skills as well as domain knowledge. Our approach facilitates data harmonization by providing guidance through a uniform process that can be applied in a wide range of projects.
联合数据分析是医学研究网络的一项关键要求。每个网络合作伙伴的数据格式各异,数据协调往往相当复杂。本文的目的是为研究网络中的数据协调过程提供一种通用方法。我们在为德国肺部研究中心的肺癌表型数据库协调来自三个站点的数据时应用了该过程。
我们开发了一个基于电子表格的解决方案作为支持肺癌数据协调过程的工具,并开发了一个基于Talend Open Studio的数据集成程序。
协调过程包括八个步骤,描述了一种用于定义和审查源数据元素以及标准化通用数据元素的系统方法。定义通用数据元素并将其与本地数据定义进行协调的步骤会反复进行,直到达成共识。应用此过程构建表型数据库,得到了一个包含285个结构化参数的肺癌通用基础数据集。肺癌表型数据库被实现为一个i2b2研究数据仓库。
数据协调是一项具有挑战性的任务,需要信息学技能和领域知识。我们的方法通过提供一个可应用于广泛项目的统一过程的指导,促进了数据协调。