Kezios Katrina L, Glymour M Maria, Zeki Al Hazzouri Adina
Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY, USA.
Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA.
Curr Epidemiol Rep. 2025 Dec;12(1). doi: 10.1007/s40471-024-00355-1. Epub 2024 Nov 6.
Research on the drivers of health across the life course would ideally be based in diverse longitudinal cohorts that repeatedly collect detailed assessments of risk factors over the full life span. However, few extant data sources in the US possess these ideal features. A "longitudinal synthetic cohort"-a dataset created by stacking or linking multiple individual cohorts spanning different but overlapping periods of the life course-can overcome some of these challenges, leveraging the strengths of each component study. This type of synthetic cohort is especially useful for aging research; it enables description of the long-term natural history of disease and novel investigations of earlier-life factors and mechanisms shaping health outcomes that typically manifest in older age, such as Alzheimer's disease and related dementias (ADRD).
We review current understanding of synthetic cohorts for life course research. We first discuss chief advantages of longitudinal synthetic cohorts, focusing on their utility for aging/ADRD research to concretize the discussion. We then summarize the conditions needed for valid inference in a synthetic cohort, depending on research goals. We end by highlighting key challenges to creating longitudinal synthetic cohorts and conducting life course research within them.
The idea of combining multiple data sources to investigate research questions that are not feasible to answer using a single cohort is gaining popularity in epidemiology. The use of longitudinal synthetic cohorts in applied research-and especially in ADRD research-has been limited, however, likely due to methodologic complexity. In particular, little guidance and few examples exist for the creation of a longitudinal synthetic cohort for causal research goals. While building synthetic cohorts requires much thought and care, it offers tremendous opportunity to address novel and critical scientific questions that could not be examined in a single study.
关于生命历程中健康驱动因素的研究,理想情况下应基于多样化的纵向队列,这些队列在整个生命周期内反复收集对风险因素的详细评估。然而,美国现有的数据源中很少有具备这些理想特征的。“纵向合成队列”——一种通过堆叠或链接跨越生命历程不同但重叠时期的多个个体队列而创建的数据集——可以克服其中一些挑战,利用每个组成研究的优势。这种类型的合成队列对衰老研究特别有用;它能够描述疾病的长期自然史,并对塑造健康结果(通常在老年时表现出来,如阿尔茨海默病和相关痴呆症(ADRD))的早年因素和机制进行新的研究。
我们综述了目前对用于生命历程研究的合成队列的理解。我们首先讨论纵向合成队列的主要优势,重点关注其在衰老/ADRD研究中的效用,以使讨论更加具体。然后,我们根据研究目标总结在合成队列中进行有效推断所需的条件。最后,我们强调创建纵向合成队列并在其中进行生命历程研究的关键挑战。
将多个数据源结合起来以研究使用单个队列无法回答的研究问题的想法在流行病学中越来越受欢迎。然而,纵向合成队列在应用研究中的使用——尤其是在ADRD研究中——受到限制,可能是由于方法学的复杂性。特别是,对于为因果研究目标创建纵向合成队列,几乎没有指导意见和实例。虽然构建合成队列需要深思熟虑和谨慎操作,但它提供了巨大的机会来解决在单个研究中无法检验的新颖且关键的科学问题。