针对大规模纵向观测数据库的多个自我对照病例系列。

Multiple self-controlled case series for large-scale longitudinal observational databases.

作者信息

Simpson Shawn E, Madigan David, Zorych Ivan, Schuemie Martijn J, Ryan Patrick B, Suchard Marc A

机构信息

Department of Statistics, Columbia University, New York, New York, U.S.A.

出版信息

Biometrics. 2013 Dec;69(4):893-902. doi: 10.1111/biom.12078. Epub 2013 Oct 11.

DOI:10.1111/biom.12078

PMID:24117144

Abstract

Characterization of relationships between time-varying drug exposures and adverse events (AEs) related to health outcomes represents the primary objective in postmarketing drug safety surveillance. Such surveillance increasingly utilizes large-scale longitudinal observational databases (LODs), containing time-stamped patient-level medical information including periods of drug exposure and dates of diagnoses for millions of patients. Statistical methods for LODs must confront computational challenges related to the scale of the data, and must also address confounding and other biases that can undermine efforts to estimate effect sizes. Methods that compare on-drug with off-drug periods within patient offer specific advantages over between patient analysis on both counts. To accomplish these aims, we extend the self-controlled case series (SCCS) for LODs. SCCS implicitly controls for fixed multiplicative baseline covariates since each individual acts as their own control. In addition, only exposed cases are required for the analysis, which is computationally advantageous. The standard SCCS approach is usually used to assess single drugs and therefore estimates marginal associations between individual drugs and particular AEs. Such analyses ignore confounding drugs and interactions and have the potential to give misleading results. In order to avoid these difficulties, we propose a regularized multiple SCCS approach that incorporates potentially thousands or more of time-varying confounders such as other drugs. The approach successfully handles the high dimensionality and can provide a sparse solution via an L₁ regularizer. We present details of the model and the associated optimization procedure, as well as results of empirical investigations.

摘要

表征随时间变化的药物暴露与健康结局相关不良事件（AE）之间的关系是上市后药物安全性监测的主要目标。此类监测越来越多地利用大规模纵向观察数据库（LOD），其中包含带时间戳的患者层面医疗信息，包括数百万患者的药物暴露期和诊断日期。用于LOD的统计方法必须应对与数据规模相关的计算挑战，还必须解决可能破坏效应大小估计工作的混杂因素和其他偏差。在这两方面，比较患者用药期和停药期的方法比患者间分析具有特定优势。为实现这些目标，我们对LOD扩展了自控病例系列（SCCS）方法。SCCS隐式控制固定的乘性基线协变量，因为每个个体都作为自身对照。此外，分析仅需要暴露病例，这在计算上具有优势。标准的SCCS方法通常用于评估单一药物，因此估计个体药物与特定AE之间的边际关联。此类分析忽略混杂药物和相互作用，可能会给出误导性结果。为避免这些困难，我们提出一种正则化多重SCCS方法，该方法纳入潜在数千种或更多的随时间变化的混杂因素，如其他药物。该方法成功处理了高维度问题，并可通过L₁正则化器提供稀疏解。我们给出了模型细节、相关优化程序以及实证研究结果。