Ong Toan, Pradhananga Rosina, Holve Erin, Kahn Michael G
Department of Pediatrics University of Colorado Anschutz Medical Campus.
AcademyHealth.
EGEMS (Wash DC). 2017 Jun 13;5(1):10. doi: 10.5334/egems.222.
Contributing health data to national, regional, and local networks or registries requires data stored in local systems with local structures and codes to be extracted, transformed, and loaded into a standard format called a Common Data Model (CDM). These processes called Extract, Transform, Load (ETL) require data partners or contributors to invest in costly technical resources with specialized skills in data models, terminologies, and programming. Given the wide range of tasks, skills, and technologies required to transform data into a CDM, a classification of ETL challenges can help identify needed resources, which in turn may encourage data partners with less-technical capabilities to participate in data-sharing networks.
We conducted key-informant interviews with data partner representatives to survey the ETL challenges faced in clinical data research networks (CDRNs) and registries. A list of ETL challenges, organized into six themes was vetted during a one-day workshop with a wide range of network stakeholders including data partners, researchers, and policy experts.
We identified 24 technical ETL challenges related to the data sharing process. All of these ETL challenges were rated as "important" or "very important" by workshop participants using a five point Likert scale. Based on these findings, a framework for categorizing ETL challenges according to ETL phases, themes, and levels of data network participation was developed.
Overcoming ETL technical challenges require significant investments in a broad array of information technologies and human resources. Identifying these technical obstacles can inform optimal resource allocation to minimize the barriers and cost of entry for new data partners into extant networks, which in turn can expand data networks' inclusiveness and diversity. This paper offers pertinent information and guiding framework that are relevant for data partners in ascertaining challenges associated with contributing data in data networks.
要将健康数据提供给国家、区域和地方网络或登记处,就需要从存储在具有本地结构和编码的本地系统中的数据中提取、转换并加载到一种称为通用数据模型(CDM)的标准格式中。这些称为提取、转换、加载(ETL)的过程要求数据合作伙伴或提供者投入成本高昂的技术资源,并配备在数据模型、术语和编程方面具备专业技能的人员。鉴于将数据转换为CDM需要涉及广泛的任务、技能和技术,对ETL挑战进行分类有助于确定所需资源,进而可能鼓励技术能力较弱的数据合作伙伴参与数据共享网络。
我们对数据合作伙伴代表进行了关键信息访谈,以调查临床数据研究网络(CDRN)和登记处面临的ETL挑战。在为期一天的研讨会上,与包括数据合作伙伴、研究人员和政策专家在内的广泛网络利益相关者一起审核了一份按六个主题组织的ETL挑战清单。
我们确定了与数据共享过程相关的24项技术ETL挑战。研讨会参与者使用五点李克特量表将所有这些ETL挑战评为“重要”或“非常重要”。基于这些发现,开发了一个根据ETL阶段、主题和数据网络参与级别对ETL挑战进行分类的框架。
克服ETL技术挑战需要在广泛的信息技术和人力资源方面进行大量投资。识别这些技术障碍可以为优化资源分配提供信息,以尽量减少新数据合作伙伴进入现有网络的障碍和成本,进而可以扩大数据网络的包容性和多样性。本文提供了与数据合作伙伴在确定数据网络中贡献数据相关挑战时相关的重要信息和指导框架。