University of Birmingham.
Warwick Medical School, University of Warwick.
Milbank Q. 2019 Mar;97(1):228-284. doi: 10.1111/1468-0009.12375.
Policy Points The use of standardized mortality rates (SMRs) to profile hospitals presumes differences in preventable deaths, and at least one health system has suggested measuring preventable death rates of hospitals for comparison across time or in league tables. The influence of reliability on the optimal review number per case note or hospital for such a program has not been explored. Estimates for preventable death rates using implicit case note reviews by clinicians are quite low, suggesting that SMRs will not work well to rank hospitals, and any misspecification of the risk-adjustment models will produce a high risk of mislabelling outliers. Most studies achieve only fair to moderate reliability of the direct assessment of whether a death is preventable, and thus it is likely that substantial numbers of reviews of deaths would be required to distinguish preventable from nonpreventable deaths as part of learning from individual cases, or for profiling hospitals. Furthermore, population- and hospital system-specific data on the variation in preventable deaths or adverse events across the hospitals and providers to be compared are required in order to design a measurement procedure and the number of reviews needed to distinguish between the patients or hospitals.
There is interest in monitoring avoidable or preventable deaths measured directly or indirectly through standardized mortality rates (SMRs). While there have been numerous studies in recent years on adverse events, including preventable deaths, using implicit case note reviews by clinicians, no systematic reviews have aimed to summarize the estimates or the variations in methodologies used to derive these estimates. We reviewed studies that use implicit case note reviews to estimate the range of preventable death rates observed, the measurement characteristics of those estimates, and the measurement procedures used to generate them. We comment on the implications for monitoring SMRs and illustrate a way to calculate the number of reviews needed to establish a reliable estimate of the preventability of one death or the hospital preventable death rate.
We conducted a systematic review of the literature supplemented by a reanalysis of authors' previously published and unpublished data and measurement design calculations. We conducted initial searches in PubMed, MEDLINE (OvidSP), and ISI Web of Knowledge in June 2010 and updated them in June 2012 and December 2017. Eligibility criteria included studies of hospital-wide admissions from general and acute medical wards where preventable death rates are provided or can be estimated and that can provide interobserver variations.
Twenty-three studies were included from 1985 to 2017. Recent larger studies suggest consistently low rates of preventable deaths (interquartile range of 3.0%-6.0% since 2008). Reliability of a single review for distinguishing between individual cases with regard to the preventability of death had a Kappa statistic of 0.10-0.50 for deaths and 0.21-0.76 for adverse events. A Kappa of 0.35 would require an average of 8 to 17 reviews of a single case to be precise enough to have confidence in high-stakes decisions to change care procedures or impose sanctions within a hospital as a result. No study estimated the variation in preventable deaths across hospitals, although we were able to reanalyze one study to obtain an estimate. Based on this estimate, 200 to 300 total case note reviews per hospital could be required to reliably distinguish between hospitals. The studies displayed considerable heterogeneity: 13/23 studies defined preventable death with a threshold of greater than or equal to four in a six-category Likert scale and 11/24 involved a two-stage screening process with nurses at the first stage and physicians at the second. Fifteen studies provided expert clinical review support for reviewer disagreements, advice, and quality control. A "generalist/internist" was the modal physician specialty for reviewers and they received one to three days of generic tools orientation and case note review practice. Methods did not consider the influence of human or environmental factors.
The literature provides limited information about the measurement characteristics of preventable deaths, suggesting that substantial numbers of reviews may be needed to create reliable estimates of preventable deaths at the individual or hospital level. Any operational program would require population-specific estimates of reliability. Preventable death rates are low, which is likely to make it difficult to use SMRs based on all deaths to validly profile hospitals. The literature provides little information to guide improvements in the measurement procedures.
政策要点 使用标准化死亡率(SMR)对医院进行分析的假设是存在可预防的死亡差异,并且至少有一个卫生系统建议使用医院的可预防死亡率进行比较,无论是跨越时间还是在联赛表中。但是,这种程序的最佳审查次数对每个病例记录或医院的影响尚未得到探索。 使用临床医生进行的隐含病例记录审查来估计可预防的死亡率的估计值相当低,这表明 SMR 不太适用于对医院进行排名,并且风险调整模型的任何不规范都将产生错误标记异常值的高风险。 大多数研究对直接评估死亡是否可预防的可靠性仅达到中等至良好水平,因此,要从个体病例中学习或对医院进行分析,很可能需要对大量死亡进行审查,才能将可预防的死亡与不可预防的死亡区分开来。此外,还需要有关医院间可预防死亡或不良事件变化的人群和医院系统特定数据,以便设计测量程序和需要审查的数量,以区分患者或医院。
人们对通过直接或间接的标准化死亡率(SMR)监测可避免或可预防的死亡感兴趣。尽管近年来已经有许多关于不良事件(包括可预防的死亡)的研究使用临床医生的隐含病例记录进行了审查,但没有系统的审查旨在总结这些估计值的变化以及用于得出这些估计值的方法。我们审查了使用隐含病例记录来估计观察到的可预防死亡率范围,评估这些估计值的测量特征以及生成这些估计值的测量程序的研究。我们对监测 SMR 的影响进行了评论,并举例说明了建立对一个死亡或医院可预防死亡率的可预防性的可靠估计所需的审查次数。
我们对文献进行了系统审查,并结合作者先前发表和未发表的数据以及测量设计计算进行了重新分析。我们于 2010 年 6 月在 PubMed,MEDLINE(OvidSP)和 ISI Web of Knowledge 中进行了初步搜索,并于 2012 年 6 月和 2017 年 12 月进行了更新。合格标准包括提供可预防死亡率或可从一般和急性内科病房的住院患者中估计可预防死亡率的医院范围的研究,并且可以提供观察者间的变化。
从 1985 年到 2017 年,有 23 项研究入选。最近的大型研究表明,可预防死亡的发生率一直较低(自 2008 年以来,四分位间距为 3.0%-6.0%)。对于死亡和不良事件,单个审查区分个体病例的可预防性的可靠性的 Kappa 统计量为 0.10-0.50 和 0.21-0.76。Kappa 为 0.35,则需要对单个病例进行 8 到 17 次审查,以达到足够的精度,从而有信心在医院内部进行高风险决策以改变护理程序或施加制裁。尽管我们能够重新分析一项研究以获取估计值,但没有一项研究估计了医院间可预防死亡的变化。基于该估计值,每个医院可能需要 200 到 300 次总病例记录审查,才能可靠地区分医院。研究显示出相当大的异质性:13/23 项研究使用六类李克特量表中大于或等于四的阈值来定义可预防的死亡,11/24 项研究涉及具有护士第一阶段和医生第二阶段的两阶段筛选过程。有 15 项研究为评审员的意见分歧,建议和质量控制提供了专家临床审查支持。“全科医生/内科医生”是评审员的主要医生专业,他们接受了一到三天的通用工具培训和病例记录审查实践。方法未考虑人为或环境因素的影响。
文献提供了有关可预防死亡的测量特征的有限信息,这表明在个人或医院层面创建可靠的可预防死亡估计值可能需要大量的审查。任何运营程序都需要特定人群的可靠性估计值。可预防死亡率较低,这可能使使用所有死亡数据来有效对医院进行排名变得困难。文献提供的指导测量程序改进的信息很少。