Stingone Jeanette A, Bledsoe H C, Cooney Grace, Diaz-Insua Mireya, Faustman Elaine, Fecho Karamarie, Gouripeddi Ramkiran, Holmes Philip, Kaeli David, Lozoya Oswaldo, Masci Anna Maria, Narayan Hina, Schmitt Charles, Shatz Maria, Tracy Wren
Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, New York, USA.
ICF, Reston, Virginia USA.
Environ Health Perspect. 2025 Jun 6. doi: 10.1289/EHP15410.
The field of environmental health sciences increasingly demands comprehensive and diverse datasets, particularly in response to emerging research areas such as climate change, mixtures, and exposomics. The data needed to address the complexity of environmental health research questions often extend beyond the boundaries of a single study or data resource. Traditional data management approaches struggle to harmonize the ever-expanding and heterogeneous data sources needed for research in the environmental health sciences. Harmonization may help address this issue as it involves aligning and standardizing various elements of data to allow comprehensive analysis, data pooling and interpretation across studies.
The primary objective is to inform researchers about the transformative potential of embracing harmonization methodologies and to motivate contributions to ongoing efforts, thereby fostering advancements.
Using the Environmental Health Language Collaborative's Data Harmonization Use Case, we provide a practical illustration of existing data harmonization approaches, identify gaps, and emphasize future research and application directions. We selected two publicly available environmental epidemiology studies on the topic of childhood asthma and three studies on the topic of biomarkers of metals exposure during pregnancy and birth outcomes and applied several existing harmonization approaches to assess interoperability.
Our process revealed the potential limitations of many existing harmonization approaches, with notable failures to identify common variables across independent datasets and lack of agreement between human and computer-based approaches. This use case identified various challenges with existing approaches, including reliance on often incomplete data documentation and large amounts of manual effort. To address these challenges, we recommend the continued advancement and dissemination of community data standards, the development of software and tools to facilitate harmonization through automation, and strategic efforts to promote engagement in data harmonization within the environmental health sciences community. Collaborative science is needed to advance our understanding of environmental contributors to health, and realizing the harmonization potential of our scientific data is a step toward improved collaboration. https://doi.org/10.1289/EHP15410.
环境卫生科学领域对全面且多样的数据集需求日益增加,尤其是为了应对气候变化、混合物以及暴露组学等新兴研究领域。解决环境卫生研究问题的复杂性所需的数据通常超出了单个研究或数据资源的范围。传统的数据管理方法难以协调环境卫生科学研究所需的不断扩展且异质的数据源。协调可能有助于解决这一问题,因为它涉及使数据的各个要素保持一致并标准化,以便进行全面分析、数据汇总以及跨研究的解释。
主要目标是让研究人员了解采用协调方法的变革潜力,并激励他们为正在进行的努力做出贡献,从而推动进展。
利用环境卫生语言协作组织的数据协调用例,我们提供了现有数据协调方法的实际示例,识别差距,并强调未来的研究和应用方向。我们选择了两项关于儿童哮喘主题的公开可用环境流行病学研究,以及三项关于孕期金属暴露生物标志物与出生结局主题的研究,并应用了几种现有的协调方法来评估互操作性。
我们的过程揭示了许多现有协调方法的潜在局限性,明显未能在独立数据集中识别共同变量,并且基于人工和基于计算机的方法之间缺乏一致性。这个用例确定了现有方法的各种挑战,包括依赖往往不完整的数据文档以及大量的人工工作。为应对这些挑战,我们建议持续推进和传播社区数据标准,开发软件和工具以通过自动化促进协调,并做出战略努力以推动环境卫生科学界参与数据协调。需要开展合作科学来增进我们对环境健康影响因素的理解,而实现科学数据的协调潜力是迈向改善合作的一步。https://doi.org/10.1289/EHP15410