一种用于非随机人群协变量平衡直接优化的进化算法。

An evolutionary algorithm for the direct optimization of covariate balance between nonrandomized populations.

机构信息

Medical Affairs and Pharmacovigilance, Bayer AG, Berlin, Germany.

出版信息

Pharm Stat. 2024 May-Jun;23(3):288-307. doi: 10.1002/pst.2352. Epub 2023 Dec 18.

Abstract

Matching reduces confounding bias in comparing the outcomes of nonrandomized patient populations by removing systematic differences between them. Under very basic assumptions, propensity score (PS) matching can be shown to eliminate bias entirely in estimating the average treatment effect on the treated. In practice, misspecification of the PS model leads to deviations from theory and matching quality is ultimately judged by the observed post-matching balance in baseline covariates. Since covariate balance is the ultimate arbiter of successful matching, we argue for an approach to matching in which the success criterion is explicitly specified and describe an evolutionary algorithm to directly optimize an arbitrary metric of covariate balance. We demonstrate the performance of the proposed method using a simulated dataset of 275,000 patients and 10 matching covariates. We further apply the method to match 250 patients from a recently completed clinical trial to a pool of more than 160,000 patients identified from electronic health records on 101 covariates. In all cases, we find that the proposed method outperforms PS matching as measured by the specified balance criterion. We additionally find that the evolutionary approach can perform comparably to another popular direct optimization technique based on linear integer programming, while having the additional advantage of supporting arbitrary balance metrics. We demonstrate how the chosen balance metric impacts the statistical properties of the resulting matched populations, emphasizing the potential impact of using nonlinear balance functions in constructing an external control arm. We release our implementation of the considered algorithms in Python.

摘要

匹配通过消除非随机患者群体之间的系统差异来减少比较非随机患者群体结果时的混杂偏差。在非常基本的假设下，可以证明倾向评分 (PS) 匹配可以完全消除估计治疗组平均治疗效果时的偏差。在实践中，PS 模型的Specification 错误会导致偏离理论，并且匹配质量最终取决于观察到的基线协变量的匹配后平衡。由于协变量平衡是成功匹配的最终仲裁者，因此我们主张采用一种匹配方法，其中明确指定成功标准，并描述一种直接优化任意协变量平衡度量的进化算法。我们使用一个包含 275000 名患者和 10 个匹配协变量的模拟数据集来演示所提出方法的性能。我们进一步将该方法应用于将 250 名来自最近完成的临床试验的患者与从 101 个协变量的电子健康记录中识别出的超过 160000 名患者的池进行匹配。在所有情况下，我们发现所提出的方法在指定的平衡标准方面优于 PS 匹配。我们还发现，进化方法可以与另一种基于线性整数规划的流行直接优化技术相媲美，同时具有支持任意平衡度量的额外优势。我们展示了所选平衡度量如何影响最终匹配群体的统计属性，强调了在构建外部对照臂时使用非线性平衡函数的潜在影响。我们在 Python 中发布了所考虑算法的实现。