Zhang Qi, MacLehose Richard F, Collin Lindsay J, Ahern Thomas P, Lash Timothy L
From the Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA.
Division of Epidemiology & Community Health, School of Public Health, University of Minnesota, Minneapolis, MN.
Epidemiology. 2025 Mar 1;36(2):237-244. doi: 10.1097/EDE.0000000000001818. Epub 2024 Nov 26.
To account for misclassification of dichotomous variables using probabilistic bias analysis, beta distributions are often assigned to bias parameters (e.g., positive and negative predictive values) based on data from an internal validation substudy. Due to the small sample size of validation substudies, zero-cell frequencies can occur. In these scenarios, it may be helpful to assign prior distributions or apply continuity corrections to the predictive value estimates.
We simulated cohort studies of varying sizes, with a binary exposure and outcome and a true risk ratio (RR) = 2.0, as well as internal validation substudies, to account for exposure misclassification. We conducted bias adjustment under five approaches assigning prior distributions to the positive and negative predictive value parameters: (1) conventional method (i.e., no prior), (2) uniform prior beta ( α = 1, β = 1), (3) Jeffreys prior beta ( α = 0.5, β = 0.5), (4) using Jeffreys prior as a continuity correction only when zero cells occurred, and (5) using the uniform prior as a continuity correction only when zero cells occurred. We evaluated performance by measuring coverage probability, bias, and mean squared error.
For sparse validation data, methods (2)-(5) all had better coverage and lower mean squared error than the conventional method, with the uniform prior (2) yielding the best performance. However, little difference between methods was observed when the validation substudy did not contain zero cells.
If sparse data are expected in a validation substudy, using a uniform prior for the beta distribution of bias parameters can improve the validity of bias-adjusted measures.
为了使用概率偏差分析来解释二分变量的错误分类,通常会根据内部验证子研究的数据,将贝塔分布分配给偏差参数(例如,阳性和阴性预测值)。由于验证子研究的样本量较小,可能会出现零单元格频率。在这些情况下,为预测值估计分配先验分布或应用连续性校正可能会有所帮助。
我们模拟了不同规模的队列研究,包括二元暴露和结局,真实风险比(RR)=2.0,以及内部验证子研究,以解释暴露错误分类。我们采用五种方法对阳性和阴性预测值参数分配先验分布进行偏差调整:(1)传统方法(即无先验),(2)均匀先验贝塔分布(α = 1,β = 1),(3)杰弗里斯先验贝塔分布(α = 0.5,β = 0.5),(4)仅在出现零单元格时使用杰弗里斯先验作为连续性校正,(5)仅在出现零单元格时使用均匀先验作为连续性校正。我们通过测量覆盖概率、偏差和均方误差来评估性能。
对于稀疏验证数据,方法(2)-(5)的覆盖度均优于传统方法,均方误差更低,其中均匀先验(2)性能最佳。然而,当验证子研究不包含零单元格时,各方法之间差异不大。
如果预计验证子研究中会出现稀疏数据,对偏差参数的贝塔分布使用均匀先验可以提高偏差调整测量的有效性。