确定应用于逻辑回归的混杂因素检测方法的概率分布并评估其敏感性和误报率。

Determining the Probability Distribution and Evaluating Sensitivity and False Positive Rate of a Confounder Detection Method Applied To Logistic Regression.

作者信息

Bliss Robin, Weinberg Janice, Webster Thomas, Vieira Veronica

机构信息

Department of Environmental Health, Boston University School of Public Health, Boston, MA, USA ; Orthopedic and Arthritis Center for Outcomes Research, Brigham and Women's Hospital/Harvard Medical School, Boston, MA, USA.

出版信息

J Biom Biostat. 2012 May 23;3(4):142. doi: 10.4172/2155-6180.1000142.

DOI:10.4172/2155-6180.1000142

PMID:23420565

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3571096/

Abstract

BACKGROUND

In epidemiologic studies researchers are often interested in detecting confounding (when a third variable is both associated with and affects associations between the outcome and predictors). Confounder detection methods often compare regression coefficients obtained from "crude" models that exclude the possible confounder(s) and "adjusted" models that include the variable(s). One such method compares the relative difference in effect estimates to a cutoff of 10% with differences of at least 10% providing evidence of confounding. METHODS: In this study we derive the asymptotic distribution of the relative change in effect statistic applied to logistic regression and evaluate the sensitivity and false positive rate of the 10% cutoff method using the asymptotic distribution. We then verify the results using simulated data. RESULTS: When applied to a logistic regression models with a dichotomous outcome, exposure, and possible confounder, we found the 10% cutoff method to have an asymptotic lognormal distribution. For sample sizes of at least 300 the authors found that when confounding existed, over 80% of models had >10% changes in odds ratios. When the confounder was not associated with the outcome, the false positive rate increased as the strength of the association between the predictor and confounder increased. When the confounder and predictor were independent of one another, false positives were rare (most < 10%). CONCLUSIONS: Researchers must be aware of high false positive rates when applying change in estimate confounder detection methods to data where the exposure is associated with possible confounder variables.

摘要

背景

在流行病学研究中，研究人员常常对检测混杂因素（即第三个变量既与结局和预测因素之间的关联相关又影响该关联）感兴趣。混杂因素检测方法通常会比较从排除可能混杂因素的“粗”模型和纳入该变量的“调整”模型中获得的回归系数。一种这样的方法是将效应估计值的相对差异与10%的临界值进行比较，差异至少为10%则表明存在混杂。

方法

在本研究中，我们推导了应用于逻辑回归的效应统计量相对变化的渐近分布，并使用该渐近分布评估10%临界值法的敏感性和假阳性率。然后我们使用模拟数据验证结果。

结果

当应用于具有二分结局、暴露因素和可能混杂因素的逻辑回归模型时，我们发现10%临界值法具有渐近对数正态分布。对于样本量至少为300的情况，作者发现当存在混杂时，超过80%的模型的比值比变化>10%。当混杂因素与结局不相关时，随着预测因素与混杂因素之间关联强度的增加，假阳性率也会增加。当混杂因素和预测因素相互独立时，假阳性很少见（大多数<10%）。