Suppr超能文献

利用机器学习在行政数据库中识别急性髓系白血病患者及其化疗方案。

Leveraging machine learning to identify acute myeloid leukemia patients and their chemotherapy regimens in an administrative database.

机构信息

Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.

Perelman School of Medicine, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, USA.

出版信息

Pediatr Blood Cancer. 2023 May;70(5):e30260. doi: 10.1002/pbc.30260. Epub 2023 Feb 23.

Abstract

BACKGROUND

Administrative datasets are useful for identifying rare disease cohorts such as pediatric acute myeloid leukemia (AML). Previously, cohorts were assembled using labor-intensive, manual reviews of patients' longitudinal chemotherapy data.

METHODS

We utilized a two-step machine learning (ML) method to (i) identify pediatric patients with newly diagnosed AML, and (ii) among the identified AML patients, their chemotherapy courses, in an administrative/billing database. Using 2558 patients previously manually reviewed, multiple ML algorithms were derived from 75% of the study sample, and the selected model was tested in the remaining hold-out sample. The selected model was also applied to assemble a new pediatric AML cohort and further assessed in an external validation, using a standalone cohort established by manual chart abstraction.

RESULTS

For patient identification, the selected Support Vector Machine model yielded a sensitivity of 0.97 and a positive predictive value (PPV) of 0.97 in the hold-out test sample. For course-specific chemotherapy regimen and start date identification, the selected Random Forest model yielded overall PPV greater than or equal to 0.88 and sensitivity greater than or equal to 0.86 across all courses in the test sample. When applied to new cohort assembly, ML identified 3016 AML patients with 10,588 treatment courses. In the external validation subset, PPV was greater than or equal to 0.75 and sensitivity was greater than or equal to 0.82 for patient identification, and PPV was greater than or equal to 0.93 and sensitivity was greater than or equal to 0.94 for regimen identifications.

CONCLUSION

A carefully designed ML model can accurately identify pediatric AML patients and their chemotherapy courses from administrative databases. This approach may be generalizable to other diseases and databases.

摘要

背景

行政数据集可用于识别儿科急性髓细胞白血病 (AML) 等罕见疾病队列。此前,队列是通过对患者的纵向化疗数据进行人工审查来建立的。

方法

我们利用两步机器学习 (ML) 方法,(i)从行政/计费数据库中识别新诊断为 AML 的儿科患者,以及 (ii) 识别出 AML 患者后,识别其化疗疗程。在之前手动审查的 2558 名患者中,从研究样本的 75%中得出了多种 ML 算法,并在其余的保留样本中测试了选定的模型。选择的模型还用于组装一个新的儿科 AML 队列,并在使用手动图表抽象建立的独立队列的外部验证中进一步评估。

结果

对于患者识别,选定的支持向量机模型在保留测试样本中的敏感性为 0.97,阳性预测值 (PPV) 为 0.97。对于特定于课程的化疗方案和开始日期的识别,选定的随机森林模型在测试样本中所有课程的总体 PPV 均大于或等于 0.88,且敏感性均大于或等于 0.86。当应用于新的队列组装时,ML 确定了 3016 名 AML 患者,共 10588 个治疗疗程。在外部验证子集中,患者识别的 PPV 大于或等于 0.75,敏感性大于或等于 0.82,方案识别的 PPV 大于或等于 0.93,敏感性大于或等于 0.94。

结论

精心设计的 ML 模型可以从行政数据库中准确识别儿科 AML 患者及其化疗疗程。这种方法可能适用于其他疾病和数据库。

相似文献

本文引用的文献

8
Cancer Statistics, 2021.癌症统计数据,2021.
CA Cancer J Clin. 2021 Jan;71(1):7-33. doi: 10.3322/caac.21654. Epub 2021 Jan 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验