Schünemann Holger J
Department of Clinical Epidemiology & Biostatistics, McMaster University Health Sciences Centre, Room 2C16, 1280 Main Street West, Hamilton, ON L8N 4K1, Canada.
J Clin Epidemiol. 2016 Jul;75:6-15. doi: 10.1016/j.jclinepi.2016.03.018. Epub 2016 Apr 6.
This article responds to issues raised by Antilla et al. in the Journal of Clinical Epidemiology about the Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group's approach to rating imprecision and GRADE's use of statistics. They argue that GRADE confuses statistical terms and should provide a stepwise rating of imprecision for making decisions. Here, a clarification of those perceptions is provided. GRADE's ratings of imprecision and other quality of evidence domains is an iterative process that may or may not consider people important thresholds of effects when systematic review authors rate imprecision. Regardless of ratings in systematic reviews, those suggesting decisions such as guideline panels, should consider if they agree or need to revise these suggested thresholds to make informed ratings about imprecision. Decision relevant thresholds are the result of a complex interplay between critical outcomes for a decision-making. The certainty in the evidence of one critical outcome and the resulting possible certainty range, which I conceptualize in this article, may influence ratings of other outcomes. To relieve systematic review authors of the often challenging burden of defining worthwhile or important effects for judging precision based on the optimal information size (OIS), a modified OIS or review information size (RIS) could be used to rate imprecision at the systematic review stage. The RIS focuses only on plausible rather plausible and worthwhile effects. The advantages of using the RIS include avoiding the reliance on statistical significance alone and the varying thresholds resulting from the importance and the baseline risk of the outcome on which the OIS relies. Finally, I argue that GRADE's certainty in the evidence is related to the statistical definition of accuracy but given GRADE's broad application to other ratings of certainty such as qualitative evidence, statistical accuracy does not serve as a definition for GRADE's quality or certainty in the evidence.
本文回应了安蒂拉等人在《临床流行病学杂志》上提出的关于推荐分级评估、制定与评价(GRADE)工作组对不精确性进行评级的方法以及GRADE对统计学的应用等问题。他们认为GRADE混淆了统计学术语,应该提供一个不精确性的逐步评级以用于决策。在此,对这些观点进行了澄清。GRADE对不精确性及其他证据质量领域的评级是一个迭代过程,在系统评价作者对不精确性进行评级时,可能会也可能不会考虑人们认为重要的效应阈值。无论系统评价中的评级如何,那些做出诸如指南小组等决策的人,都应考虑他们是否同意或需要修订这些建议的阈值,以便对不精确性做出明智的评级。与决策相关的阈值是决策关键结果之间复杂相互作用的结果。我在本文中概念化的一个关键结果的证据确定性以及由此产生的可能的确定性范围,可能会影响其他结果的评级。为了减轻系统评价作者在基于最佳信息规模(OIS)定义判断精确性的有价值或重要效应时经常面临的具有挑战性的负担,可以使用修改后的OIS或综述信息规模(RIS)在系统评价阶段对不精确性进行评级。RIS仅关注似是而非的而非似是而非且有价值的效应。使用RIS的优点包括避免仅依赖统计显著性,以及避免因OIS所依赖的结果的重要性和基线风险而产生的不同阈值。最后,我认为GRADE的证据确定性与准确性的统计定义相关,但鉴于GRADE广泛应用于其他确定性评级,如定性证据,统计准确性并不能作为GRADE证据质量或确定性的定义。