Wallach Joshua D, Sullivan Patrick G, Trepanowski John F, Sainani Kristin L, Steyerberg Ewout W, Ioannidis John P A
Department of Health Research and Policy, Stanford University School of Medicine, Stanford, California2Meta-Research Innovation Center at Stanford (METRICS), Stanford University School of Medicine, Stanford, California.
Department of Health Research and Policy, Stanford University School of Medicine, Stanford, California2Meta-Research Innovation Center at Stanford (METRICS), Stanford University School of Medicine, Stanford, California3Department of Medicine, Stanford University School of Medicine, Stanford, California.
JAMA Intern Med. 2017 Apr 1;177(4):554-560. doi: 10.1001/jamainternmed.2016.9125.
Many published randomized clinical trials (RCTs) make claims for subgroup differences.
To evaluate how often subgroup claims reported in the abstracts of RCTs are actually supported by statistical evidence (P < .05 from an interaction test) and corroborated by subsequent RCTs and meta-analyses.
This meta-epidemiological survey examines data sets of trials with at least 1 subgroup claim, including Subgroup Analysis of Trials Is Rarely Easy (SATIRE) articles and Discontinuation of Randomized Trials (DISCO) articles. We used Scopus (updated July 2016) to search for English-language articles citing each of the eligible index articles with at least 1 subgroup finding in the abstract.
Articles with a subgroup claim in the abstract with or without evidence of statistical heterogeneity (P < .05 from an interaction test) in the text and articles attempting to corroborate the subgroup findings.
Study characteristics of trials with at least 1 subgroup claim in the abstract were recorded. Two reviewers extracted the data necessary to calculate subgroup-level effect sizes, standard errors, and the P values for interaction. For individual RCTs and meta-analyses that attempted to corroborate the subgroup findings from the index articles, trial characteristics were extracted. Cochran Q test was used to reevaluate heterogeneity with the data from all available trials.
The number of subgroup claims in the abstracts of RCTs, the number of subgroup claims in the abstracts of RCTs with statistical support (subgroup findings), and the number of subgroup findings corroborated by subsequent RCTs and meta-analyses.
Sixty-four eligible RCTs made a total of 117 subgroup claims in their abstracts. Of these 117 claims, only 46 (39.3%) in 33 articles had evidence of statistically significant heterogeneity from a test for interaction. In addition, out of these 46 subgroup findings, only 16 (34.8%) ensured balance between randomization groups within the subgroups (eg, through stratified randomization), 13 (28.3%) entailed a prespecified subgroup analysis, and 1 (2.2%) was adjusted for multiple testing. Only 5 (10.9%) of the 46 subgroup findings had at least 1 subsequent pure corroboration attempt by a meta-analysis or an RCT. In all 5 cases, the corroboration attempts found no evidence of a statistically significant subgroup effect. In addition, all effect sizes from meta-analyses were attenuated toward the null.
A minority of subgroup claims made in the abstracts of RCTs are supported by their own data (ie, a significant interaction effect). For those that have statistical support (P < .05 from an interaction test), most fail to meet other best practices for subgroup tests, including prespecification, stratified randomization, and adjustment for multiple testing. Attempts to corroborate statistically significant subgroup differences are rare; when done, the initially observed subgroup differences are not reproduced.
许多已发表的随机临床试验(RCT)都声称存在亚组差异。
评估RCT摘要中报告的亚组声明实际得到统计证据支持(交互检验P < .05)并得到后续RCT和荟萃分析证实的频率。
这项元流行病学调查研究了至少有1个亚组声明的试验数据集,包括试验亚组分析难得简单(SATIRE)文章和随机试验终止(DISCO)文章。我们使用Scopus(2016年7月更新)搜索引用每篇符合条件的索引文章的英文文章,这些索引文章在摘要中至少有1个亚组发现。
摘要中有亚组声明且文本中有或无统计异质性证据(交互检验P < .05)的文章,以及试图证实亚组发现的文章。
记录摘要中至少有1个亚组声明的试验的研究特征。两名审阅者提取计算亚组水平效应大小、标准误和交互作用P值所需的数据。对于试图证实索引文章中亚组发现的个体RCT和荟萃分析,提取试验特征。使用 Cochr an Q检验重新评估所有可用试验数据的异质性。
RCT摘要中的亚组声明数量、有统计支持的RCT摘要中的亚组声明数量(亚组发现),以及后续RCT和荟萃分析证实的亚组发现数量。
64项符合条件的RCT在其摘要中总共提出了117项亚组声明。在这117项声明中,33篇文章中只有46项(39.3%)有交互检验显示具有统计学显著异质性的证据。此外,在这46项亚组发现中,只有16项(34.8%)确保了亚组内随机分组之间的平衡(例如,通过分层随机化),13项(28.3%)进行了预先指定的亚组分析,1项(2.2%)进行了多重检验调整。46项亚组发现中只有5项(10.9%)至少有1次随后由荟萃分析或RCT进行的纯粹证实尝试。在所有5个案例中,证实尝试均未发现具有统计学显著亚组效应的证据。此外,荟萃分析的所有效应大小均向无效值衰减。
RCT摘要中提出的亚组声明中,少数得到了自身数据的支持(即显著的交互效应)。对于那些有统计支持的声明(交互检验P < .05),大多数未满足亚组检验的其他最佳实践,包括预先指定、分层随机化和多重检验调整。试图证实具有统计学显著的亚组差异的情况很少见;即便进行了尝试,最初观察到的亚组差异也未得到重现。