Schuster Tibor, Pang Menglan, Platt Robert W
Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada.
Centre for Clinical Epidemiology, Lady Davis Institute, Jewish General Hospital, Montreal, Quebec, Canada.
Pharmacoepidemiol Drug Saf. 2015 Sep;24(9):1004-7. doi: 10.1002/pds.3773. Epub 2015 Apr 10.
The high-dimensional propensity score algorithm attempts to improve control of confounding in typical treatment effect studies in pharmacoepidemiology and is increasingly being used for the analysis of large administrative databases. Within this multi-step variable selection algorithm, the marginal prevalence of non-zero covariate values is considered to be an indicator for a count variable's potential confounding impact. We investigate the role of the marginal prevalence of confounder variables on potentially caused bias magnitudes when estimating risk ratios in point exposure studies with binary outcomes.
We apply the law of total probability in conjunction with an established bias formula to derive and illustrate relative bias boundaries with respect to marginal confounder prevalence.
We show that maximum possible bias magnitudes can occur at any marginal prevalence level of a binary confounder variable. In particular, we demonstrate that, in case of rare or very common exposures, low and high prevalent confounder variables can still have large confounding impact on estimated risk ratios.
Covariate pre-selection by prevalence may lead to sub-optimal confounder sampling within the high-dimensional propensity score algorithm. While we believe that the high-dimensional propensity score has important benefits in large-scale pharmacoepidemiologic studies, we recommend omitting the prevalence-based empirical identification of candidate covariates.
高维倾向评分算法试图在药物流行病学的典型治疗效果研究中改善混杂因素的控制,并且越来越多地用于大型管理数据库的分析。在这个多步骤变量选择算法中,非零协变量值的边际患病率被视为计数变量潜在混杂影响的一个指标。我们研究了在二元结局的点暴露研究中估计风险比时,混杂变量的边际患病率对潜在导致的偏倚大小的作用。
我们应用全概率定律结合一个既定的偏倚公式来推导和说明关于边际混杂因素患病率的相对偏倚界限。
我们表明,二元混杂变量的任何边际患病率水平都可能出现最大可能的偏倚大小。特别是,我们证明,在暴露罕见或非常常见的情况下,低患病率和高患病率的混杂变量仍可能对估计的风险比产生很大的混杂影响。
在高维倾向评分算法中,按患病率进行协变量预选择可能导致混杂因素抽样次优。虽然我们认为高维倾向评分在大规模药物流行病学研究中有重要益处,但我们建议省略基于患病率的候选协变量的经验性识别。