Real World Solutions, IQVIA, London, UK.
Real World Solutions, IQVIA, Plymouth Meeting, Pennsylvania, USA
BMJ Health Care Inform. 2022 Mar;29(1). doi: 10.1136/bmjhci-2021-100510.
To develop and evaluate machine learning models to detect patients with suspected undiagnosed non-alcoholic steatohepatitis (NASH) for diagnostic screening and clinical management.
In this retrospective observational non-interventional study using administrative medical claims data from 1 463 089 patients, gradient-boosted decision trees were trained to detect patients with likely NASH from an at-risk patient population with a history of obesity, type 2 diabetes mellitus, metabolic disorder or non-alcoholic fatty liver (NAFL). Models were trained to detect likely NASH in all at-risk patients or in the subset without a prior NAFL diagnosis (at-risk non-NAFL patients). Models were trained and validated using retrospective medical claims data and assessed using area under precision recall curves and receiver operating characteristic curves (AUPRCs and AUROCs).
The 6-month incidences of NASH in claims data were 1 per 1437 at-risk patients and 1 per 2127 at-risk non-NAFL patients . The model trained to detect NASH in all at-risk patients had an AUPRC of 0.0107 (95% CI 0.0104 to 0.0110) and an AUROC of 0.84. At 10% recall, model precision was 4.3%, which is 60× above NASH incidence. The model trained to detect NASH in the non-NAFL cohort had an AUPRC of 0.0030 (95% CI 0.0029 to 0.0031) and an AUROC of 0.78. At 10% recall, model precision was 1%, which is 20× above NASH incidence.
The low incidence of NASH in medical claims data corroborates the pattern of NASH underdiagnosis in clinical practice. Claims-based machine learning could facilitate the detection of patients with probable NASH for diagnostic testing and disease management.
开发和评估机器学习模型,以检测疑似未确诊非酒精性脂肪性肝炎(NASH)的患者,用于诊断筛查和临床管理。
在这项使用来自 1463089 名患者的行政医疗索赔数据的回顾性观察性非干预性研究中,使用梯度提升决策树来从有肥胖、2 型糖尿病、代谢紊乱或非酒精性脂肪肝(NAFL)病史的高危患者人群中检测可能患有 NASH 的患者。模型被训练用于检测所有高危患者或无先前 NAFL 诊断的亚组(高危非 NAFL 患者)中可能患有 NASH 的患者。使用回顾性医疗索赔数据对模型进行训练和验证,并使用精度-召回曲线下面积和接收器操作特征曲线(AUPRC 和 AUROC)进行评估。
在索赔数据中,NASH 的 6 个月发生率为每 1437 名高危患者 1 例和每 2127 名高危非 NAFL 患者 1 例。训练用于检测所有高危患者 NASH 的模型的 AUPRC 为 0.0107(95%CI 0.0104 至 0.0110),AUROC 为 0.84。在 10%召回率下,模型精度为 4.3%,是 NASH 发病率的 60 倍。训练用于检测非 NAFL 队列中 NASH 的模型的 AUPRC 为 0.0030(95%CI 0.0029 至 0.0031),AUROC 为 0.78。在 10%召回率下,模型精度为 1%,是 NASH 发病率的 20 倍。
医疗索赔数据中 NASH 的低发生率证实了 NASH 在临床实践中诊断不足的模式。基于索赔的机器学习可以帮助检测可能患有 NASH 的患者,以进行诊断测试和疾病管理。