Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.
Biostatistics Innovation Group, Gilead Sciences, Foster City, CA, USA.
Sci Adv. 2024 May 31;10(22):eadj0266. doi: 10.1126/sciadv.adj0266.
Selection bias poses a substantial challenge to valid statistical inference in nonprobability samples. This study compared estimates of the first-dose COVID-19 vaccination rates among Indian adults in 2021 from a large nonprobability sample, the COVID-19 Trends and Impact Survey (CTIS), and a small probability survey, the Center for Voting Options and Trends in Election Research (CVoter), against national benchmark data from the COVID Vaccine Intelligence Network. Notably, CTIS exhibits a larger estimation error on average (0.37) compared to CVoter (0.14). Additionally, we explored the accuracy (regarding mean squared error) of CTIS in estimating successive differences (over time) and subgroup differences (for females versus males) in mean vaccine uptakes. Compared to the overall vaccination rates, targeting these alternative estimands comparing differences or relative differences in two means increased the effective sample size. These results suggest that the Big Data Paradox can manifest in countries beyond the United States and may not apply equally to every estimand of interest.
选择偏差对非概率样本中有效的统计推断构成了重大挑战。本研究比较了来自大型非概率样本 COVID-19 趋势和影响调查(CTIS)和小概率调查选举研究投票选择和趋势中心(CVoter)的 2021 年印度成年人首剂 COVID-19 疫苗接种率的估计值与来自 COVID 疫苗智能网络的国家基准数据。值得注意的是,CTIS 的平均估计误差(0.37)明显大于 CVoter(0.14)。此外,我们还探讨了 CTIS 在估计连续差异(随时间推移)和亚组差异(女性与男性)方面的准确性(关于均方误差)。与总体疫苗接种率相比,针对这些替代估计值,比较两个平均值之间的差异或相对差异会增加有效样本量。这些结果表明,大数据悖论可能在美国以外的国家表现出来,并且可能不适用于每个感兴趣的估计值。
Cochrane Database Syst Rev. 2022-11-17
Health Technol Assess. 2024-7
Cochrane Database Syst Rev. 2022-5-20
Cochrane Database Syst Rev. 2021-9-2
J Surv Stat Methodol. 2024-11
Npj Ment Health Res. 2024-11-24
Educ Res. 2022-12
PLOS Glob Public Health. 2023-11-30
MMWR Morb Mortal Wkly Rep. 2022-3-4
MMWR Morb Mortal Wkly Rep. 2022-2-4