Duke University School of Medicine, Durham, NC, USA; Duke Institute for Health Innovation, Durham, NC, USA.
Duke Institute for Health Innovation, Durham, NC, USA.
Int J Med Inform. 2021 Jul;151:104466. doi: 10.1016/j.ijmedinf.2021.104466. Epub 2021 Apr 16.
The primary purpose of this work is to systematically assess the performance trade-offs on clinical prediction tasks of four diagnosis code groupings: AHRQ-Elixhauser, Single-level CCS, truncated ICD-9-CM codes, and raw ICD-9-CM codes.
We used two distinct datasets from different geographic regions and patient populations and train models for three prediction tasks: 1-year mortality following an ICU stay, 30-day mortality following surgery, and 30-day complication following surgery. We run multiple commonly-used binary classification models including penalized logistic regression, random forest, and gradient boosted trees. Model performance is evaluated using the Area Under the Receiver Operating Characteristic (AUROC) and the Area Under the Precision-Recall Curve (AUCPR).
Single-level CCS, truncated codes, and raw codes significantly outperformed AHRQ-Elixhauser ICD grouping when predicting 30-day postoperative complication and one-year mortality after ICU admission. The performance across groupings was more similar in the 30-day postoperative mortality prediction task.
Single-level CCS groupings represent aggregations of raw codes into meaningful clinical concepts and consistently balance interoperability between ICD-9-CM and ICD-10-CM while maintaining strong model performance as measured by AUROC and AUCPR. Key limitations include experimentation across two datasets and three prediction tasks, which although were well labeled and sufficiently prevalent, do not encompass all modeling tasks and outcomes.
Single-level CCS groupings may serve as a good baseline for future models that incorporate diagnosis codes as features in clinical prediction tasks. Code and a compute environment summary are provided along with the analyses to enable reproducibility and to support future research.
本研究旨在系统评估四种诊断代码分组在临床预测任务中的性能权衡:AHRQ-Elixhauser、单级 CCS、截断 ICD-9-CM 代码和原始 ICD-9-CM 代码。
我们使用了来自不同地理区域和患者群体的两个不同数据集,并为三个预测任务训练模型:1. 入住 ICU 后的 1 年死亡率;2. 手术后 30 天死亡率;3. 手术后 30 天并发症。我们运行了多个常用的二分类模型,包括惩罚逻辑回归、随机森林和梯度提升树。使用接收者操作特征曲线下的面积(AUROC)和精度-召回曲线下的面积(AUCPR)评估模型性能。
在预测 30 天术后并发症和 ICU 入住后 1 年死亡率方面,单级 CCS、截断代码和原始代码明显优于 AHRQ-Elixhauser ICD 分组。在预测 30 天术后死亡率任务中,分组之间的性能更为相似。
单级 CCS 分组将原始代码聚合为有意义的临床概念,并在保持与 ICD-10-CM 的互操作性的同时,始终保持强大的模型性能(通过 AUROC 和 AUCPR 衡量)。主要限制包括在两个数据集和三个预测任务上进行实验,尽管这些数据集和任务标记良好且足够普遍,但并不包含所有建模任务和结果。
单级 CCS 分组可能成为未来将诊断代码作为临床预测任务特征纳入模型的良好基准。提供了代码和计算环境摘要,以实现可重复性并支持未来的研究。