RAND Corporation, Pittsburgh, Pennsylvania.
University of California at Los Angeles, Los Angeles, California.
Health Serv Res. 2019 Apr;54(2):509-517. doi: 10.1111/1475-6773.13100. Epub 2018 Dec 11.
To sample 40 physician organizations stratified on the basis of longitudinal cost of care measures for qualitative interviews in order to describe the range of care delivery structures and processes that are being deployed to influence the total costs of caring for patients.
Three years of physician organization-level total cost of care data (n = 156 in California) from the Integrated Healthcare Association's value-based pay-for-performance program.
We fit total cost of care data using mixture and K-means clustering algorithms to segment the population of physician organizations into sampling strata based on 3-year cost trajectories (ie, cost curves).
A mixture of multivariate normal distributions can classify physician organization cost curves into clusters defined by total cost level, shape, and within-cluster variation. K-means clustering does not accommodate differing levels of within-cluster variation and resulted in more clusters being allocated to unstable cost curves. A mixture of regressions approach focuses overly on anomalous trajectories and is sensitive to model coding.
Statistical clustering can be used to form sampling strata when longitudinal measures are of primary interest. Many clustering algorithms are available; the choice of the clustering algorithm can strongly impact the resulting strata because various algorithms focus on different aspects of the observed data.
针对纵向医疗成本措施进行分层抽样,选取 40 个医师组织进行定性访谈,以描述正在部署的一系列影响患者整体医疗成本的医疗服务提供结构和流程。
来自整合医疗协会基于价值的按绩效付费计划的三年医师组织层面的整体医疗成本数据(加利福尼亚州共 156 个)。
我们使用混合和 K 均值聚类算法对整体医疗成本数据进行拟合,根据 3 年成本轨迹(即成本曲线)将医师组织人群划分为抽样分层。
多元正态分布的混合可以将医师组织成本曲线分类为以总成本水平、形状和聚类内变异定义的聚类。K 均值聚类无法适应聚类内变异的不同水平,并且导致更多的聚类被分配到不稳定的成本曲线。回归混合方法过于关注异常轨迹,并且对模型编码敏感。
当纵向措施是主要关注点时,可以使用统计聚类来形成抽样分层。有许多聚类算法可供选择;聚类算法的选择会强烈影响最终的分层,因为各种算法侧重于观察数据的不同方面。