使用序列和聚类分析来表征随时间变化的变量：流行病学家的实施方法及实际考量

Using sequence and cluster analysis to characterize variables that unfold over time: implementation and practical considerations for epidemiologists.

作者信息

Pacca Lucia, Dang Kristina V, Koenig Leah, dP Duarte Catherine, Gaye S Amina, Harrati Amal, Vable Anusha M

机构信息

Department of Family and Community Medicine, University of California San Francisco, 2540 23rd St, San Francisco, CA 94110, United States.

University of Southern California, Los Angeles, California 90007.

出版信息

Am J Epidemiol. 2025 Apr 10. doi: 10.1093/aje/kwaf065.

DOI:10.1093/aje/kwaf065

PMID:40219634

Abstract

Characterizing longitudinal trajectories of variables that unfold over time (e.g. social, health or environmental variables) is a persistent challenge, but can be accomplished with sequence and cluster analysis, data-driven approaches that can differentiate timing, order and duration of events. We present practical guidance on implementing sequence and cluster analysis for epidemiologists with the goal of providing clear advice on decision points and tradeoffs. We introduce the three main steps of sequence and cluster analysis: (1) coding trajectories of ordered events (data cleaning); (2) measuring dissimilarity between trajectories (sequence analysis); and (3) grouping similar trajectories (cluster analysis). Each of these steps presents researchers with several decision points, such as data cleaning rules, options for evaluating sequence dissimilarity, and choices of clustering algorithms. After outlining each of the sequence analysis steps, we provide an applied example of sequence analysis in which we create and group transition-to-retirement trajectories from age 51-75 for a sample of 9,189 Health and Retirement Study participants using self-reported employment information, then estimate the association between transition-to-retirement groups and self-rated health. We seek to provide an initial guide for epidemiologists through analytic decisions and implementation challenges of sequence analysis as this approach is increasingly implemented and undergoes methodological advances.

摘要

描述随时间变化的变量（如社会、健康或环境变量）的纵向轨迹是一项长期挑战，但可以通过序列和聚类分析来实现，这是一种数据驱动的方法，能够区分事件的时间、顺序和持续时间。我们为流行病学家提供关于实施序列和聚类分析的实用指南，目的是在决策点和权衡方面提供明确建议。我们介绍序列和聚类分析的三个主要步骤：（1）对有序事件的轨迹进行编码（数据清理）；（2）测量轨迹之间的差异（序列分析）；（3）对相似轨迹进行分组（聚类分析）。这些步骤中的每一步都给研究人员带来了几个决策点，例如数据清理规则、评估序列差异的选项以及聚类算法的选择。在概述了每个序列分析步骤之后，我们提供了一个序列分析的应用示例，在该示例中，我们使用自我报告的就业信息，为9189名健康与退休研究参与者的样本创建并分组了51岁至75岁的退休过渡轨迹，然后估计退休过渡组与自评健康之间的关联。随着这种方法越来越多地被采用并在方法上取得进展，我们试图通过序列分析的分析决策和实施挑战，为流行病学家提供一个初步指南。

相似文献

Using sequence and cluster analysis to characterize variables that unfold over time: implementation and practical considerations for epidemiologists.使用序列和聚类分析来表征随时间变化的变量：流行病学家的实施方法及实际考量

Am J Epidemiol. 2025 Apr 10. doi: 10.1093/aje/kwaf065.

Short-Term Memory Impairment短期记忆障碍

Antibiotics for exacerbations of asthma.用于哮喘加重期的抗生素

Cochrane Database Syst Rev. 2018 Jun 25;6(6):CD002741. doi: 10.1002/14651858.CD002741.pub2.

Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施：系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。

Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验：定性证据综合。

Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.

The Lived Experience of Autistic Adults in Employment: A Systematic Search and Synthesis.成年自闭症患者的就业生活经历：系统检索与综述

Autism Adulthood. 2024 Dec 2;6(4):495-509. doi: 10.1089/aut.2022.0114. eCollection 2024 Dec.

Computer and mobile technology interventions for self-management in chronic obstructive pulmonary disease.用于慢性阻塞性肺疾病自我管理的计算机和移动技术干预措施。

Cochrane Database Syst Rev. 2017 May 23;5(5):CD011425. doi: 10.1002/14651858.CD011425.pub2.

Algorithm-based pain management for people with dementia in nursing homes.基于算法的养老院痴呆患者疼痛管理。

Cochrane Database Syst Rev. 2022 Apr 1;4(4):CD013339. doi: 10.1002/14651858.CD013339.pub2.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗？

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

引用本文的文献

Thirty-Year Glycemic Trajectories From Young Adulthood Through Middle Age.从青年期到中年期的30年血糖轨迹。

JAMA Netw Open. 2025 Jun 2;8(6):e2517455. doi: 10.1001/jamanetworkopen.2025.17455.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用序列和聚类分析来表征随时间变化的变量：流行病学家的实施方法及实际考量

Using sequence and cluster analysis to characterize variables that unfold over time: implementation and practical considerations for epidemiologists.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献