Institute of Health Policy and Management, Erasmus University Rotterdam, Rotterdam, The Netherlands.
Med Care. 2013 Aug;51(8):731-9. doi: 10.1097/MLR.0b013e3182992bc1.
Individual physicians are increasingly being subjected to comparative performance assessments. When single-insurer data are used to profile individual physicians' performance, reliable measurements are uncertain because of small sample sizes.
Administrative data (2006-2008) from a Dutch insurer are used to examine variation in general practitioners' (GPs) performance on expenses (5 measures), utilization of hospital care (2 measures), and clinical quality for diabetes and chronic obstructive pulmonary disease (6 measures). Unadjusted and adjusted multilevel models are used to separate total variance in between-GP and within-GP components. The components are used to calculate intraclass correlation coefficients (ICCs), reliability, and sample size requirements at common reliability thresholds.
Average ICCs varied between 0.07% (hospital admissions) and 8.34% (physiotherapy for chronic obstructive pulmonary disease patients). Risk-adjustment often greatly changed the relative size of variance components and often led to lower ICCs. In addition, ICCs and thus reliability generally decreased over time. Eight measures had reliabilities > 0.70, and 3 of these (all GP-related expenses) > 0.90. Measures related to utilization of hospital care had reliabilities < 0.60 or even 0.50. For 5 measures, the vast majority of GPs had sufficient patients to reach 0.70 reliability. At a reliability of 0.90, however, there were no measures for which all GPs met the sample size requirements.
Reliable measurement of individual physicians' performance using single-purchaser data is challenging. For most measures reliability was insufficient to allow for high-stakes applications or even any application of profiling. Future research should continue to explore methods for enhancing the reliability of individual physicians' profiles.
个体医生越来越多地受到绩效比较评估。当使用单一保险公司的数据来描绘个体医生的绩效时,由于样本量小,可靠的测量结果并不确定。
使用荷兰一家保险公司的管理数据(2006-2008 年),考察普通科医生(GP)在费用(5 项指标)、医院护理利用率(2 项指标)和糖尿病及慢性阻塞性肺疾病的临床质量(6 项指标)方面的绩效差异。使用未调整和调整后的多层次模型来分离 GP 间和 GP 内的总方差。使用这些组件计算常见可靠性阈值下的组内相关系数(ICC)、可靠性和样本量要求。
平均 ICC 从 0.07%(住院人数)到 8.34%(慢性阻塞性肺疾病患者的物理治疗)不等。风险调整通常会极大地改变方差分量的相对大小,并经常导致 ICC 降低。此外,ICC 以及可靠性通常会随时间而降低。有 8 项指标的可靠性大于 0.70,其中 3 项(所有与 GP 相关的费用)大于 0.90。与医院护理利用率相关的指标可靠性小于 0.60,甚至小于 0.50。对于 5 项措施,绝大多数 GP 有足够的患者达到 0.70 的可靠性。然而,在可靠性为 0.90 的情况下,没有任何措施能够满足所有 GP 的样本量要求。
使用单一购买者的数据可靠地衡量个体医生的绩效具有挑战性。对于大多数指标来说,可靠性不足以进行高风险的应用,甚至无法进行任何个人档案的应用。未来的研究应继续探索提高个体医生档案可靠性的方法。