Tourangeau Roger, Sun Hanyu, Yan Ting
Westat Inc., Methodology, Westat, 1600 Research Boulevard, Rockville, MD 20850, USA.
Westat Inc., Statistical Department, Westat, 1600 Research Boulevard, Rockville, MD 20850, USA.
J Surv Stat Methodol. 2020 Sep 8;9(4):651-673. doi: 10.1093/jssam/smaa018. eCollection 2021 Sep.
The usual method for assessing the reliability of survey data has been to conduct reinterviews a short interval (such as one to two weeks) after an initial interview and to use these data to estimate relatively simple statistics, such as gross difference rates (GDRs). More sophisticated approaches have also been used to estimate reliability. These include estimates from multi-trait, multi-method experiments, models applied to longitudinal data, and latent class analyses. To our knowledge, no prior study has systematically compared these different methods for assessing reliability. The Population Assessment of Tobacco and Health Reliability and Validity (PATH-RV) Study, done on a national probability sample, assessed the reliability of answers to the Wave 4 questionnaire from the PATH Study. Respondents in the PATH-RV were interviewed twice about two weeks apart. We examined whether the classic survey approach yielded different conclusions from the more sophisticated methods. We also examined two methods for assessing problems with survey questions and item nonresponse rates and response times to see how strongly these related to the different reliability estimates. We found that kappa was highly correlated with both GDRs and over-time correlations, but the latter two statistics were less highly correlated, particularly for adult respondents; estimates from longitudinal analyses of the same items in the main PATH study were also highly correlated with the traditional reliability estimates. The latent class analysis results, based on fewer items, also showed a high level of agreement with the traditional measures. The other methods and indicators had at best weak relationships with the reliability estimates derived from the reinterview data. Although the Question Understanding Aid seems to tap a different factor from the other measures, for adult respondents, it did predict item nonresponse and response latencies and thus may be a useful adjunct to the traditional measures.
评估调查数据可靠性的常用方法是在首次访谈后短时间间隔(如1至2周)内进行重新访谈,并使用这些数据来估计相对简单的统计数据,如总差异率(GDRs)。也采用了更复杂的方法来估计可靠性。这些方法包括多特质、多方法实验的估计、应用于纵向数据的模型以及潜在类别分析。据我们所知,此前没有研究系统地比较过这些评估可靠性的不同方法。烟草与健康人口评估可靠性与有效性(PATH-RV)研究基于全国概率样本,评估了PATH研究第4轮问卷答案的可靠性。PATH-RV的受访者接受了两次间隔约两周的访谈。我们研究了经典调查方法是否会得出与更复杂方法不同的结论。我们还研究了两种评估调查问题、项目无回答率和回答时间问题的方法,以了解它们与不同可靠性估计的关联程度。我们发现,kappa与GDRs和随时间的相关性都高度相关,但后两个统计数据的相关性较低,尤其是对于成年受访者;对PATH主要研究中相同项目的纵向分析估计也与传统可靠性估计高度相关。基于较少项目的潜在类别分析结果也与传统测量方法高度一致。其他方法和指标与重新访谈数据得出的可靠性估计最多只有微弱的关系。尽管问题理解辅助工具似乎涉及到与其他测量方法不同的因素,但对于成年受访者,它确实可以预测项目无回答和回答延迟,因此可能是传统测量方法的有用补充。