评估在真实世界数据中定义可观察时间对结局发生率的影响。

Evaluation of the impact of defining observable time in real-world data on outcome incidence.

作者信息

Blacketer Clair, DeFalco Frank J, Conover Mitchell M, Ryan Patrick B, Schuemie Martijn J, Rijnbeek Peter R

机构信息

Coordinating Center, Observational Health Data Sciences and Informatics (OHDSI), New York, NY, 10032, United States.

Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, NL, 3015 GD, United States.

出版信息

J Am Med Inform Assoc. 2025 Sep 1;32(9):1434-1444. doi: 10.1093/jamia/ocaf119.

DOI:10.1093/jamia/ocaf119

PMID:40694804

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12361855/

Abstract

OBJECTIVE

In real-world data (RWD), defining the observation period-the time during which a patient is considered observable-is critical for estimating incidence rates (IRs) and other outcomes. Yet, in the absence of explicit enrollment information, this period must often be inferred, introducing potential bias.

MATERIALS AND METHODS

This study evaluates methods for defining observation periods and their impact on IR estimates across multiple database types. We applied 3 methods for defining observation periods: (1) a persistence + surveillance window approach, (2) an age- and gender-adjusted method based on time between healthcare events, and (3) the min/max method. These were tested across 11 RWD databases, including both enrollment-based and encounter-based sources. Enrollment time was used as the reference standard in eligible databases. To assess the impact on epidemiologic results, we replicated a prior study of adverse event incidence, comparing IRs and calculating mean squared error between methods.

RESULTS

Incidence rates decreased as observation periods lengthened, driven by increases in the person-time denominator. The persistence + surveillance method produced estimates closest to enrollment-based rates when appropriately balanced. The min/max approach yielded inconsistent results, particularly in encounter-based databases, with greater error observed in databases with longer time spans.

DISCUSSION

These findings suggest that assumptions about data completeness and population observability significantly affect incidence estimates. Observation period definitions substantially influence outcome measurement in RWD studies.

CONCLUSION

Standardized, transparent approaches are necessary to ensure valid, reproducible results-especially in databases lacking defined enrollment.

摘要

目的

在真实世界数据（RWD）中，定义观察期（即患者被视为可观察的时间段）对于估计发病率（IR）和其他结局至关重要。然而，在缺乏明确的入组信息时，这个时间段通常必须进行推断，这就引入了潜在的偏差。

材料与方法

本研究评估了定义观察期的方法及其对多种数据库类型中IR估计值的影响。我们应用了3种定义观察期的方法：（1）持续存在+监测窗口方法；（2）基于医疗事件之间时间的年龄和性别调整方法；（3）最小/最大方法。这些方法在11个RWD数据库中进行了测试，包括基于入组和基于就诊的数据源。在符合条件的数据库中，将入组时间用作参考标准。为了评估对流行病学结果的影响，我们重复了一项先前关于不良事件发生率的研究，比较了发病率，并计算了不同方法之间的均方误差。