Lin Lifeng
Department of Statistics, Florida State University, Tallahassee, Florida, USA.
J Eval Clin Pract. 2021 Apr;27(2):356-364. doi: 10.1111/jep.13428. Epub 2020 Jun 10.
As the recent literature has growing concerns about research replicability and the misuse and misconception of P-values, the fragility index (FI) has been an attractive measure to assess the robustness (or fragility) of clinical study results with binary outcomes. It is defined as the minimum number of event status modifications that can alter a study result's statistical significance (or non-significance). Owing to its intuitive concept, the FI has been applied to assess the fragility of clinical studies of various specialties. However, the FI may be limited in certain settings. As a relatively new measure, more work is needed to examine its properties.
This article explores several factors that may impact the derivation of the FI, including how event status is modified and the impact of significance levels. Moreover, we propose novel methods to visualize the fragility of a study's result. These factors and methods are illustrated using worked examples of artificial datasets. Randomized controlled trials on antidepressant drugs are also used to evaluate their real-world performance.
The FI depends on the treatment arm(s) in which event status is modified, whether the original study result is significant, the statistical method used for calculating the P-value, and the threshold for determining statistical significance. Also, the proposed visualization methods can clearly demonstrate a study result's fragility, which may be useful supplements to the single value of the FI.
Our findings may help clinicians properly use the FI and appraise the reliability of a study's conclusion.
原理、目的和目标:鉴于近期文献对研究可重复性以及P值的滥用和误解日益关注,脆弱性指数(FI)已成为评估二元结局临床研究结果稳健性(或脆弱性)的一种有吸引力的指标。它被定义为能够改变研究结果统计学显著性(或非显著性)的事件状态修改的最小数量。由于其概念直观,FI已被应用于评估各个专业临床研究的脆弱性。然而,FI在某些情况下可能存在局限性。作为一种相对较新的指标,需要更多工作来检验其特性。
本文探讨了可能影响FI推导的几个因素,包括事件状态如何修改以及显著性水平的影响。此外,我们提出了可视化研究结果脆弱性的新方法。使用人工数据集的实例对这些因素和方法进行了说明。还使用了抗抑郁药物的随机对照试验来评估它们在实际应用中的表现。
FI取决于修改事件状态的治疗组、原始研究结果是否显著、用于计算P值的统计方法以及确定统计学显著性的阈值。此外,所提出的可视化方法可以清楚地展示研究结果的脆弱性,这可能是对FI单一值的有用补充。
我们的研究结果可能有助于临床医生正确使用FI并评估研究结论的可靠性。