从队列研究中进行数据协调和数据池化：一种实用的数据管理方法。

Data harmonization and data pooling from cohort studies: a practical approach for data management.

机构信息

Department of Community Health Sciences, University of Calgary, Calgary, Canada.

Applied Research and Evaluation- Primary Health Care, Alberta Health Services, Calgary, Canada.

出版信息

Int J Popul Data Sci. 2021 Nov 30;6(1):1680. doi: 10.23889/ijpds.v6i1.1680. eCollection 2021.

DOI:10.23889/ijpds.v6i1.1680

PMID:34888420

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8631396/

Abstract

Data pooling from pre-existing datasets can be useful to increase study sample size and statistical power in order to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization, an approach that can generate comparable datasets from heterogeneous sources, can address this issue in some circumstances. As an illustrative example, this paper describes the data harmonization strategies that helped generate comparable datasets across two Canadian pregnancy cohort studies: All Our Families; and the Alberta Pregnancy Outcomes and Nutrition. Variables were harmonized considering multiple features across the datasets: the construct measured; question asked/response options; the measurement scale used; the frequency of measurement; timing of measurement, and the data structure. Completely matching, partially matching, and completely un-matching variables across the datasets were determined based on these features. Variables that were an exact match were pooled as is. Partially matching variables were harmonized or processed under a common format across the datasets considering the frequency of measurement, the timing of measurement, the measurement scale used, and response options. Variables that were completely unmatching could not be harmonized into a single variable. The variable harmonization strategies that were used to generate comparable cohort datasets for data pooling are applicable to other data sources. Future studies may employ or evaluate these strategies, which permit researchers to answer novel research questions in a statistically efficient, timely, and cost-efficient manner that could not be achieved using a single data source.

摘要

从现有的数据集进行数据汇集，可以增加研究样本量和统计效力，从而回答研究问题。然而，各个数据集可能包含以不同方式测量同一结构的变量，这给数据汇集带来了挑战。变量协调是一种可以从异构来源生成可比数据集的方法，可以在某些情况下解决这个问题。本文以两个加拿大妊娠队列研究为例，描述了有助于生成可比数据集的数据协调策略：All Our Families 和 Alberta Pregnancy Outcomes and Nutrition。变量根据数据集之间的多个特征进行协调：测量的结构；提出的问题/响应选项；使用的测量尺度；测量的频率；测量的时间以及数据结构。根据这些特征，确定了数据集之间完全匹配、部分匹配和完全不匹配的变量。完全匹配的变量按原样进行汇集。部分匹配的变量根据测量频率、测量时间、使用的测量尺度和响应选项在数据集之间以通用格式进行协调或处理。完全不匹配的变量无法协调成单个变量。用于生成可用于数据汇集的可比队列数据集的数据协调策略适用于其他数据源。未来的研究可以采用或评估这些策略，使研究人员能够以统计上有效、及时和具有成本效益的方式回答新的研究问题，而这些问题无法通过单个数据源来实现。

相似文献

Data harmonization and data pooling from cohort studies: a practical approach for data management.

Int J Popul Data Sci. 2021 Nov 30;6(1):1680. doi: 10.23889/ijpds.v6i1.1680. eCollection 2021.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

A review of harmonization methods for studying dietary patterns.

Smart Health (Amst). 2022 Mar;23. doi: 10.1016/j.smhl.2021.100263. Epub 2022 Jan 13.

Data harmonization and federated analysis of population-based studies: the BioSHaRE project.

Emerg Themes Epidemiol. 2013 Nov 21;10(1):12. doi: 10.1186/1742-7622-10-12.

psHarmonize: Facilitating reproducible large-scale pre-statistical data harmonization and documentation in R.

Patterns (N Y). 2024 Jun 14;5(8):101003. doi: 10.1016/j.patter.2024.101003. eCollection 2024 Aug 9.

Pre-statistical harmonization of behavrioal instruments across eight surveys and trials.

BMC Med Res Methodol. 2021 Oct 25;21(1):227. doi: 10.1186/s12874-021-01431-6.

European Project on OSteoArthritis (EPOSA): methodological challenges in harmonization of existing data from five European population-based cohorts on aging.

BMC Musculoskelet Disord. 2011 Nov 28;12:272. doi: 10.1186/1471-2474-12-272.

A simple pooling method for variable selection in multiply imputed datasets outperformed complex methods.

BMC Med Res Methodol. 2022 Aug 4;22(1):214. doi: 10.1186/s12874-022-01693-8.

Toward Rigorous Data Harmonization in Cancer Epidemiology Research: One Approach.

Am J Epidemiol. 2015 Dec 15;182(12):1033-8. doi: 10.1093/aje/kwv133. Epub 2015 Nov 20.

Is rigorous retrospective harmonization possible? Application of the DataSHaPER approach across 53 large studies.

Int J Epidemiol. 2011 Oct;40(5):1314-28. doi: 10.1093/ije/dyr106. Epub 2011 Jul 30.

引用本文的文献

Automated Data Harmonization in Clinical Research: Natural Language Processing Approach.

JMIR Form Res. 2025 Aug 27;9:e75608. doi: 10.2196/75608.

Standardized Mean Differences: No So Standard After All.

Campbell Syst Rev. 2025 Aug 17;21(3):e70056. doi: 10.1002/cl2.70056. eCollection 2025 Sep.

An Introduction to Longitudinal Synthetic Cohorts for Studying the Life Course Drivers of Health Outcomes and Inequalities in Older Age.

Curr Epidemiol Rep. 2025 Dec;12(1). doi: 10.1007/s40471-024-00355-1. Epub 2024 Nov 6.

A natural language processing approach to support biomedical data harmonization: Leveraging large language models.

PLoS One. 2025 Jul 24;20(7):e0328262. doi: 10.1371/journal.pone.0328262. eCollection 2025.

Harmonization and integration of data from prospective cohort studies across the Region of the Americas.

Rev Panam Salud Publica. 2025 May 27;49:e54. doi: 10.26633/RPSP.2025.54. eCollection 2025.

Artificial intelligence-driven clinical decision support systems for early detection and precision therapy in oral cancer: a mini review.

Front Oral Health. 2025 Apr 28;6:1592428. doi: 10.3389/froh.2025.1592428. eCollection 2025.

Considering the interconnected nature of social identities in neuroimaging research.

Nat Neurosci. 2025 Feb;28(2):222-233. doi: 10.1038/s41593-024-01832-y. Epub 2024 Dec 27.

Conceptual framework for data harmonisation in mental health using the International Classification of Functioning, Disability and Health: an example with the R2D2-MH consortium.

BMJ Ment Health. 2024 Nov 28;27(1):e301283. doi: 10.1136/bmjment-2024-301283.

Evaluating the current methodological practices and issues in existing literature in pooling complex surveys: a systematic review.

BMC Med Res Methodol. 2024 Nov 13;24(1):279. doi: 10.1186/s12874-024-02400-5.

Moving the Needle for Ever-ELs?: Advanced Math Course Taking and College Enrollment.

Int multiling res j. 2024;18(2):173-195. doi: 10.1080/19313152.2023.2289290. Epub 2023 Dec 7.

本文引用的文献

Cohort Profile: Research Advancement through Cohort Cataloguing and Harmonization (ReACH).

Int J Epidemiol. 2021 May 17;50(2):396-397. doi: 10.1093/ije/dyaa207.

Assessment of anxiety during pregnancy: are existing multiple anxiety scales suitable and comparable in measuring anxiety during pregnancy?

J Psychosom Obstet Gynaecol. 2021 Jun;42(2):140-146. doi: 10.1080/0167482X.2020.1725462. Epub 2020 Feb 14.

Neighbourhood socioeconomic status modifies the association between anxiety and depression during pregnancy and preterm birth: a Community-based Canadian cohort study.

BMJ Open. 2020 Feb 10;10(2):e031035. doi: 10.1136/bmjopen-2019-031035.

Does neighborhood socioeconomic status predict the risk of preterm birth? A community-based Canadian cohort study.

BMJ Open. 2019 Feb 20;9(2):e025341. doi: 10.1136/bmjopen-2018-025341.

Cohort Profile: The All Our Babies pregnancy cohort (AOB).

Int J Epidemiol. 2017 Oct 1;46(5):1389-1390k. doi: 10.1093/ije/dyw363.

Maelstrom Research guidelines for rigorous retrospective data harmonization.

Int J Epidemiol. 2017 Feb 1;46(1):103-105. doi: 10.1093/ije/dyw075.

Comparison of sample characteristics in two pregnancy cohorts: community-based versus population-based recruitment methods.

BMC Med Res Methodol. 2013 Dec 6;13:149. doi: 10.1186/1471-2288-13-149.

The All Our Babies pregnancy cohort: design, methods, and participant characteristics.

BMC Pregnancy Childbirth. 2013;13 Suppl 1(Suppl 1):S2. doi: 10.1186/1471-2393-13-S1-S2. Epub 2013 Jan 31.

Using the Edinburgh postnatal depression scale to screen for anxiety disorders: conceptual and methodological considerations.

J Affect Disord. 2013 Apr 5;146(2):224-30. doi: 10.1016/j.jad.2012.09.009. Epub 2012 Oct 30.

The Alberta Pregnancy Outcomes and Nutrition (APrON) cohort study: rationale and methods.

Matern Child Nutr. 2014 Jan;10(1):44-60. doi: 10.1111/j.1740-8709.2012.00433.x. Epub 2012 Jul 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从队列研究中进行数据协调和数据池化：一种实用的数据管理方法。

Data harmonization and data pooling from cohort studies: a practical approach for data management.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献