Center for Innovation in Population Health, College of Public Health, University of Kentucky, Lexington.
Department of Economics, Boston University, Boston, Massachusetts.
JAMA Health Forum. 2024 Apr 5;5(4):e240625. doi: 10.1001/jamahealthforum.2024.0625.
Models predicting health care spending and other outcomes from administrative records are widely used to manage and pay for health care, despite well-documented deficiencies. New methods are needed that can incorporate more than 70 000 diagnoses without creating undesirable coding incentives.
To develop a machine learning (ML) algorithm, building on Diagnostic Item (DXI) categories and Diagnostic Cost Group (DCG) methods, that automates development of clinically credible and transparent predictive models for policymakers and clinicians.
DESIGN, SETTING, AND PARTICIPANTS: DXIs were organized into disease hierarchies and assigned an Appropriateness to Include (ATI) score to reflect vagueness and gameability concerns. A novel automated DCG algorithm iteratively assigned DXIs in 1 or more disease hierarchies to DCGs, identifying sets of DXIs with the largest regression coefficient as dominant; presence of a previously identified dominating DXI removed lower-ranked ones before the next iteration. The Merative MarketScan Commercial Claims and Encounters Database for commercial health insurance enrollees 64 years and younger was used. Data from January 2016 through December 2018 were randomly split 90% to 10% for model development and validation, respectively. Deidentified claims and enrollment data were delivered by Merative the following November in each calendar year and analyzed from November 2020 to January 2024.
Concurrent top-coded total health care cost. Model performance was assessed using validation sample weighted least-squares regression, mean absolute errors, and mean errors for rare and common diagnoses.
This study included 35 245 586 commercial health insurance enrollees 64 years and younger (65 901 460 person-years) and relied on 19 clinicians who provided reviews in the base model. The algorithm implemented 218 clinician-specified hierarchies compared with the US Department of Health and Human Services (HHS) hierarchical condition category (HCC) model's 64 hierarchies. The base model that dropped vague and gameable DXIs reduced the number of parameters by 80% (1624 of 3150), achieved an R2 of 0.535, and kept mean predicted spending within 12% ($3843 of $31 313) of actual spending for the 3% of people with rare diseases. In contrast, the HHS HCC model had an R2 of 0.428 and underpaid this group by 33% ($10 354 of $31 313).
In this study, by automating DXI clustering within clinically specified hierarchies, this algorithm built clinically interpretable risk models in large datasets while addressing diagnostic vagueness and gameability concerns.
尽管已有充分的文件证明其存在缺陷,但预测医疗支出和其他结果的模型仍被广泛用于管理和支付医疗保健费用。需要新的方法,这些方法可以纳入 70000 多个诊断,而不会产生不良的编码激励。
开发一种机器学习 (ML) 算法,基于诊断项目 (DXI) 类别和诊断费用组 (DCG) 方法,为政策制定者和临床医生自动开发具有临床可信度和透明度的预测模型。
设计、设置和参与者:将 DXIs 组织成疾病层次结构,并分配适当性纳入 (ATI) 分数,以反映模糊性和可操作性问题。一种新颖的自动化 DCG 算法迭代地将 DXIs 分配到 1 个或多个疾病层次结构中,识别出具有最大回归系数的 DXIs 集作为主导;之前确定的主导 DXI 的存在消除了下一个迭代之前排名较低的 DXIs。使用 Merative 的 MarketScan 商业索赔和就诊数据库,该数据库涵盖了 64 岁及以下的商业健康保险参保人。2016 年 1 月至 2018 年 12 月的数据分别随机分为 90%和 10%用于模型开发和验证。Merative 在次年 11 月提供了经过去识别的索赔和参保数据,并在 2020 年 11 月至 2024 年 1 月进行了分析。
同期最高编码总医疗费用。使用验证样本加权最小二乘回归、罕见和常见诊断的平均绝对误差和平均误差来评估模型性能。
这项研究包括 35245586 名 64 岁及以下的商业健康保险参保人(65901460 人年),并依赖于 19 名在基础模型中提供审查的临床医生。该算法实施了 218 个临床医生指定的层次结构,而美国卫生与公众服务部(HHS)的层次条件类别(HCC)模型只有 64 个层次结构。基础模型删除了模糊和可操作性的 DXIs,将参数数量减少了 80%(从 3150 个减少到 1624 个),实现了 0.535 的 R2,并将 3%罕见疾病患者的预测支出保持在实际支出的 12%以内(3843 美元,31313 美元)。相比之下,HHS HCC 模型的 R2 为 0.428,该模型对这一人群的支付额低了 33%(10354 美元,31313 美元)。
在这项研究中,通过在临床指定的层次结构中自动对 DXI 进行聚类,该算法在大型数据集上构建了具有临床可解释性的风险模型,同时解决了诊断模糊性和可操作性问题。