Suppr超能文献

调查数据分析中的分析误差是多大的问题?

How Big of a Problem is Analytic Error in Secondary Analyses of Survey Data?

作者信息

West Brady T, Sakshaug Joseph W, Aurelien Guy Alain S

机构信息

Survey Research Center, Institute for Social Research, University of Michigan-Ann Arbor, Ann Arbor, Michigan, United States of America.

Cathie Marsh Institute for Social Research, University of Manchester, Manchester, England.

出版信息

PLoS One. 2016 Jun 29;11(6):e0158120. doi: 10.1371/journal.pone.0158120. eCollection 2016.

Abstract

Secondary analyses of survey data collected from large probability samples of persons or establishments further scientific progress in many fields. The complex design features of these samples improve data collection efficiency, but also require analysts to account for these features when conducting analysis. Unfortunately, many secondary analysts from fields outside of statistics, biostatistics, and survey methodology do not have adequate training in this area, and as a result may apply incorrect statistical methods when analyzing these survey data sets. This in turn could lead to the publication of incorrect inferences based on the survey data that effectively negate the resources dedicated to these surveys. In this article, we build on the results of a preliminary meta-analysis of 100 peer-reviewed journal articles presenting analyses of data from a variety of national health surveys, which suggested that analytic errors may be extremely prevalent in these types of investigations. We first perform a meta-analysis of a stratified random sample of 145 additional research products analyzing survey data from the Scientists and Engineers Statistical Data System (SESTAT), which describes features of the U.S. Science and Engineering workforce, and examine trends in the prevalence of analytic error across the decades used to stratify the sample. We once again find that analytic errors appear to be quite prevalent in these studies. Next, we present several example analyses of real SESTAT data, and demonstrate that a failure to perform these analyses correctly can result in substantially biased estimates with standard errors that do not adequately reflect complex sample design features. Collectively, the results of this investigation suggest that reviewers of this type of research need to pay much closer attention to the analytic methods employed by researchers attempting to publish or present secondary analyses of survey data.

摘要

对从个人或机构的大概率样本中收集的调查数据进行二次分析,可推动许多领域的科学进步。这些样本复杂的设计特征提高了数据收集效率,但也要求分析人员在进行分析时考虑这些特征。不幸的是,许多来自统计学、生物统计学和调查方法学以外领域的二次分析人员在这方面没有接受过充分的培训,因此在分析这些调查数据集时可能会应用错误的统计方法。这进而可能导致基于调查数据发表错误的推论,从而有效地浪费了投入到这些调查中的资源。在本文中,我们基于对100篇同行评审期刊文章的初步荟萃分析结果展开研究,这些文章展示了对各种国家健康调查数据的分析,结果表明分析错误在这类调查中可能极为普遍。我们首先对另外145项研究成果进行分层随机抽样的荟萃分析,这些研究分析了来自科学家和工程师统计数据系统(SESTAT)的调查数据(该系统描述了美国科学和工程领域劳动力的特征),并研究了用于对样本进行分层的几十年间分析错误发生率的趋势。我们再次发现,分析错误在这些研究中似乎相当普遍。接下来,我们给出几个对真实SESTAT数据的示例分析,并证明如果不能正确进行这些分析,可能会导致估计值出现严重偏差,且标准误差无法充分反映复杂的样本设计特征。总体而言,这项调查结果表明,这类研究的评审人员需要更加密切地关注试图发表或展示调查数据二次分析结果的研究人员所采用的分析方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验