Dahl Fredrik A, Grotle Margreth, Saltyte Benth Jūrate, Natvig Bård
Helse Sør-Øst Health Services Research Centre, Akershus University Hospital, Lorenskog, Norway.
Eur J Epidemiol. 2008;23(4):237-42. doi: 10.1007/s10654-008-9230-x. Epub 2008 Feb 21.
There is growing concern in the scientific community that many published scientific findings may represent spurious patterns that are not reproducible in independent data sets. A reason for this is that significance levels or confidence intervals are often applied to secondary variables or sub-samples within the trial, in addition to the primary hypotheses (multiple hypotheses). This problem is likely to be extensive for population-based surveys, in which epidemiological hypotheses are derived after seeing the data set (hypothesis fishing). We recommend a data-splitting procedure to counteract this methodological problem, in which one part of the data set is used for identifying hypotheses, and the other is used for hypothesis testing. The procedure is similar to two-stage analysis of microarray data. We illustrate the process using a real data set related to predictors of low back pain at 14-year follow-up in a population initially free of low back pain. "Widespreadness" of pain (pain reported in several other places than the low back) was a statistically significant predictor, while smoking was not, despite its strong association with low back pain in the first half of the data set. We argue that the application of data splitting, in which an independent party handles the data set, will achieve for epidemiological surveys what pre-registration has done for clinical studies.
科学界越来越担心,许多已发表的科学发现可能代表了虚假模式,在独立数据集中无法重现。原因之一是,除了主要假设(多重假设)之外,显著性水平或置信区间通常还应用于试验中的次要变量或子样本。对于基于人群的调查而言,这个问题可能很普遍,在这类调查中,流行病学假设是在查看数据集之后得出的(假设捕捞)。我们建议采用一种数据拆分程序来应对这一方法学问题,即将数据集的一部分用于识别假设,另一部分用于假设检验。该程序类似于微阵列数据的两阶段分析。我们使用一个真实数据集来说明这个过程,该数据集与最初无腰痛人群14年随访时腰痛的预测因素有关。疼痛的“广泛性”(除腰部外其他几个部位也报告有疼痛)是一个具有统计学意义的预测因素,而吸烟则不是,尽管在数据集的前半部分吸烟与腰痛有很强的关联。我们认为,由独立方处理数据集的数据拆分应用,将为流行病学调查带来与临床研究预注册相同的效果。