Pi Selina, Goldhaber-Fiebert Jeremy D, Alarid-Escudero Fernando
Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, USA.
Department of Health Policy, School of Medicine, Center for Health Policy, Freeman-Spogli Institute for International Studies, Stanford University, Stanford, CA, USA.
medRxiv. 2025 May 2:2025.04.30.25326766. doi: 10.1101/2025.04.30.25326766.
Microsimulation models generate individual life trajectories that must be summarized as population-level outcomes for model calibration and validation. While there are established formulas to calculate outcomes such as prevalence, incidence, and lifetime risk from cross-sectional and short-term longitudinal studies, limited guidance exists to calculate these outcomes using long-term longitudinal data due to the rarity of large-scale studies covering events across the human lifespan. This technical report presents various methods to calculate epidemiological outcomes from simulated longitudinal data, from replicating a real-world study design to fully incorporating longitudinal disease and exposure durations. We provide an open-source code base with functions in R to calculate the prevalence, incidence, age-conditional risk, lifetime risk, and disease-specific mortality of a condition from individual-level time-to-event data. In addition, we provide guidance and code for calculating cancer-related outcomes from individual-level data, such as the stage distribution at diagnosis, the distribution of precancerous lesion multiplicity, and the mean dwell and sojourn time. Given the various possible formulations for certain outcomes, we call for increased transparency in reporting how summary outcomes are derived from microsimulation model outputs, and we anticipate that this report will facilitate the calculation of epidemiological outcomes in both simulated and real-world data.
微观模拟模型生成个体生命轨迹,为了模型校准和验证,这些轨迹必须汇总为人群水平的结果。虽然有既定公式可从横断面研究和短期纵向研究中计算诸如患病率、发病率和终生风险等结果,但由于缺乏涵盖人类整个生命周期事件的大规模研究,利用长期纵向数据计算这些结果的指导有限。本技术报告介绍了从模拟纵向数据计算流行病学结果的各种方法,从复制现实世界研究设计到全面纳入纵向疾病和暴露持续时间。我们提供了一个开源代码库,其中包含用R语言编写的函数,用于根据个体水平的事件发生时间数据计算某种疾病的患病率、发病率、年龄条件风险、终生风险和疾病特异性死亡率。此外,我们还提供了从个体水平数据计算癌症相关结果的指导和代码,如诊断时的分期分布、癌前病变多灶性分布以及平均停留和旅居时间。鉴于某些结果有多种可能的表述方式,我们呼吁在报告汇总结果如何从微观模拟模型输出中得出时提高透明度,并且我们预计本报告将有助于在模拟数据和现实世界数据中计算流行病学结果。