Department of Internal Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
Division of Allergy, Pulmonary and Critical Care Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
Respir Res. 2022 May 28;23(1):138. doi: 10.1186/s12931-022-02055-0.
Study of pulmonary arterial hypertension (PAH) in claims-based (CB) cohorts may facilitate understanding of disease epidemiology, however previous CB algorithms to identify PAH have had limited test characteristics. We hypothesized that machine learning algorithms (MLA) could accurately identify PAH in an CB cohort.
ICD-9/10 codes, CPT codes or PAH medications were used to screen an electronic medical record (EMR) for possible PAH. A subset (Development Cohort) was manually reviewed and adjudicated as PAH or "not PAH" and used to train and test MLAs. A second subset (Refinement Cohort) was manually reviewed and combined with the Development Cohort to make The Final Cohort, again divided into training and testing sets, with MLA characteristics defined on test set. The MLA was validated using an independent EMR cohort.
194 PAH and 786 "not PAH" in the Development Cohort trained and tested the initial MLA. In the Final Cohort test set, the final MLA sensitivity was 0.88, specificity was 0.93, positive predictive value was 0.89, and negative predictive value was 0.92. Persistence and strength of PAH medication use and CPT code for right heart catheterization were principal MLA features. Applying the MLA to the EMR cohort using a split cohort internal validation approach, we found 265 additional non-confirmed cases of suspected PAH that exhibited typical PAH demographics, comorbidities, hemodynamics.
We developed and validated a MLA using only CB features that identified PAH in the EMR with strong test characteristics. When deployed across an entire EMR, the MLA identified cases with known features of PAH.
基于索赔(CB)队列的肺动脉高压(PAH)研究可能有助于了解疾病流行病学,但以前用于识别 PAH 的 CB 算法的测试特征有限。我们假设机器学习算法(MLA)可以在 CB 队列中准确识别 PAH。
使用 ICD-9/10 代码、CPT 代码或 PAH 药物筛选电子病历(EMR)中可能的 PAH。一个子集(开发队列)进行了手动审查和裁决为 PAH 或“非 PAH”,并用于训练和测试 MLA。第二个子集(精炼队列)进行了手动审查,并与开发队列合并为最终队列,再次分为训练和测试集,在测试集上定义 MLA 特征。使用独立的 EMR 队列验证 MLA。
在开发队列中,有 194 例 PAH 和 786 例“非 PAH”被训练和测试了初始 MLA。在最终队列的测试集中,最终 MLA 的敏感性为 0.88,特异性为 0.93,阳性预测值为 0.89,阴性预测值为 0.92。PAH 药物使用的持久性和强度以及右心导管插入术的 CPT 代码是主要的 MLA 特征。使用拆分队列内部验证方法将 MLA 应用于 EMR 队列,我们发现了 265 例疑似 PAH 的额外非确诊病例,这些病例表现出典型的 PAH 人口统计学、合并症、血液动力学特征。
我们仅使用 CB 特征开发和验证了一种 MLA,该 MLA 可在 EMR 中以较强的测试特征识别 PAH。当在整个 EMR 中部署时,MLA 可以识别具有已知 PAH 特征的病例。