利用机器学习在行政数据库中识别急性髓系白血病患者及其化疗方案。

Leveraging machine learning to identify acute myeloid leukemia patients and their chemotherapy regimens in an administrative database.

机构信息

Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.

Perelman School of Medicine, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, USA.

出版信息

Pediatr Blood Cancer. 2023 May;70(5):e30260. doi: 10.1002/pbc.30260. Epub 2023 Feb 23.

DOI:10.1002/pbc.30260

PMID:36815580

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10402395/

Abstract

BACKGROUND

Administrative datasets are useful for identifying rare disease cohorts such as pediatric acute myeloid leukemia (AML). Previously, cohorts were assembled using labor-intensive, manual reviews of patients' longitudinal chemotherapy data.

METHODS

We utilized a two-step machine learning (ML) method to (i) identify pediatric patients with newly diagnosed AML, and (ii) among the identified AML patients, their chemotherapy courses, in an administrative/billing database. Using 2558 patients previously manually reviewed, multiple ML algorithms were derived from 75% of the study sample, and the selected model was tested in the remaining hold-out sample. The selected model was also applied to assemble a new pediatric AML cohort and further assessed in an external validation, using a standalone cohort established by manual chart abstraction.

RESULTS

For patient identification, the selected Support Vector Machine model yielded a sensitivity of 0.97 and a positive predictive value (PPV) of 0.97 in the hold-out test sample. For course-specific chemotherapy regimen and start date identification, the selected Random Forest model yielded overall PPV greater than or equal to 0.88 and sensitivity greater than or equal to 0.86 across all courses in the test sample. When applied to new cohort assembly, ML identified 3016 AML patients with 10,588 treatment courses. In the external validation subset, PPV was greater than or equal to 0.75 and sensitivity was greater than or equal to 0.82 for patient identification, and PPV was greater than or equal to 0.93 and sensitivity was greater than or equal to 0.94 for regimen identifications.

CONCLUSION

A carefully designed ML model can accurately identify pediatric AML patients and their chemotherapy courses from administrative databases. This approach may be generalizable to other diseases and databases.

摘要

背景

行政数据集可用于识别儿科急性髓细胞白血病 (AML) 等罕见疾病队列。此前，队列是通过对患者的纵向化疗数据进行人工审查来建立的。

方法

我们利用两步机器学习 (ML) 方法，（i）从行政/计费数据库中识别新诊断为 AML 的儿科患者，以及 (ii) 识别出 AML 患者后，识别其化疗疗程。在之前手动审查的 2558 名患者中，从研究样本的 75%中得出了多种 ML 算法，并在其余的保留样本中测试了选定的模型。选择的模型还用于组装一个新的儿科 AML 队列，并在使用手动图表抽象建立的独立队列的外部验证中进一步评估。

结果

对于患者识别，选定的支持向量机模型在保留测试样本中的敏感性为 0.97，阳性预测值 (PPV) 为 0.97。对于特定于课程的化疗方案和开始日期的识别，选定的随机森林模型在测试样本中所有课程的总体 PPV 均大于或等于 0.88，且敏感性均大于或等于 0.86。当应用于新的队列组装时，ML 确定了 3016 名 AML 患者，共 10588 个治疗疗程。在外部验证子集中，患者识别的 PPV 大于或等于 0.75，敏感性大于或等于 0.82，方案识别的 PPV 大于或等于 0.93，敏感性大于或等于 0.94。

结论

精心设计的 ML 模型可以从行政数据库中准确识别儿科 AML 患者及其化疗疗程。这种方法可能适用于其他疾病和数据库。

相似文献

Leveraging machine learning to identify acute myeloid leukemia patients and their chemotherapy regimens in an administrative database.利用机器学习在行政数据库中识别急性髓系白血病患者及其化疗方案。

Pediatr Blood Cancer. 2023 May;70(5):e30260. doi: 10.1002/pbc.30260. Epub 2023 Feb 23.

Applying machine learning to identify pediatric patients with newly diagnosed acute lymphoblastic leukemia using administrative data.运用机器学习识别新诊断为急性淋巴细胞白血病的儿科患者的行政数据。

Pediatr Blood Cancer. 2024 Mar;71(3):e30858. doi: 10.1002/pbc.30858. Epub 2024 Jan 8.

Assembly of a cohort of children treated for acute myeloid leukemia at free-standing children's hospitals in the United States using an administrative database.使用行政数据库对美国独立儿童医院治疗的急性髓系白血病患儿队列进行组装。

Pediatr Blood Cancer. 2013 Mar;60(3):508-11. doi: 10.1002/pbc.24402. Epub 2012 Nov 28.

Evaluation of a machine-learning model based on laboratory parameters for the prediction of acute leukaemia subtypes: a multicentre model development and validation study in France.基于实验室参数的机器学习模型对急性白血病亚型预测的评估：法国多中心模型开发和验证研究。

Lancet Digit Health. 2024 May;6(5):e323-e333. doi: 10.1016/S2589-7500(24)00044-X.

Predicting In-Hospital Mortality After Acute Myeloid Leukemia Therapy: Through Supervised Machine Learning Algorithms.预测急性髓系白血病治疗后的院内死亡率：通过监督式机器学习算法

JCO Clin Cancer Inform. 2022 Dec;6:e2200044. doi: 10.1200/CCI.22.00044.

Comparison of administrative/billing data to expected protocol-mandated chemotherapy exposure in children with acute myeloid leukemia: A report from the Children's Oncology Group.比较儿童急性髓细胞白血病的行政/计费数据与预期方案规定的化疗暴露：来自儿童肿瘤学组的报告。

Pediatr Blood Cancer. 2015 Jul;62(7):1184-9. doi: 10.1002/pbc.25475. Epub 2015 Mar 11.

Classification of acute myeloid leukemia M1 and M2 subtypes using machine learning.采用机器学习对急性髓系白血病 M1 和 M2 亚型进行分类。

Comput Biol Med. 2022 Aug;147:105741. doi: 10.1016/j.compbiomed.2022.105741. Epub 2022 Jun 15.

Multiple machine-learning tools identifying prognostic biomarkers for acute Myeloid Leukemia.多种机器学习工具鉴定急性髓系白血病的预后生物标志物。

BMC Med Inform Decis Mak. 2024 Jan 2;24(1):2. doi: 10.1186/s12911-023-02408-9.

Chiari malformation Type I surgery in pediatric patients. Part 1: validation of an ICD-9-CM code search algorithm.小儿患者的Ⅰ型Chiari畸形手术。第1部分：ICD-9-CM编码搜索算法的验证。

J Neurosurg Pediatr. 2016 May;17(5):519-24. doi: 10.3171/2015.10.PEDS15370. Epub 2016 Jan 22.

French Retrospective Database Analysis of Patient Characteristics and Treatment Patterns in Patients with R/R FLT3-Mutated AML: A Registry-Based Cohort Study.法国复发/难治性FLT3突变急性髓系白血病患者特征及治疗模式的回顾性数据库分析：一项基于登记处的队列研究

Oncol Ther. 2023 Sep;11(3):375-389. doi: 10.1007/s40487-023-00239-2. Epub 2023 Aug 14.

引用本文的文献

Discovery of Dynamic Models for AML Disease Progression from Longitudinal Multi-Modal Clinical Data Using Explainable Machine Learning.利用可解释机器学习从纵向多模态临床数据中发现急性髓系白血病疾病进展的动态模型

medRxiv. 2025 Apr 15:2025.04.07.25325267. doi: 10.1101/2025.04.07.25325267.

Extracting Electronic Health Record Neuroblastoma Treatment Data With High Fidelity Using the REDCap Clinical Data Interoperability Services Module.使用 REDCap 临床数据互操作性服务模块，以高保真度提取电子健康记录神经母细胞瘤治疗数据。

JCO Clin Cancer Inform. 2024 May;8:e2400009. doi: 10.1200/CCI.24.00009.

Making sense of the risks: what to tell adolescents and young adults diagnosed with cancer during pregnancy.认清风险：对于孕期被诊断出癌症的青少年和年轻人该告知什么。

J Natl Cancer Inst. 2023 Jun 8;115(6):603-604. doi: 10.1093/jnci/djad066.

本文引用的文献

Improving Cohort Definitions in Research Using Hospital Administrative Databases-Do We Need Guidelines?利用医院管理数据库改进研究中的队列定义——我们需要指南吗？

JAMA Pediatr. 2022 Jun 1;176(6):539-540. doi: 10.1001/jamapediatrics.2022.0091.

Machine learning approaches to investigate Clostridioides difficile infection and outcomes: A systematic review.机器学习方法在研究艰难梭菌感染及结局中的应用：一项系统综述。

Int J Med Inform. 2022 Apr;160:104706. doi: 10.1016/j.ijmedinf.2022.104706. Epub 2022 Jan 31.

Applications of Artificial Intelligence in Pediatric Oncology: A Systematic Review.人工智能在儿科肿瘤学中的应用：系统评价。

JCO Clin Cancer Inform. 2021 Dec;5:1208-1219. doi: 10.1200/CCI.21.00102.

Medical Outcomes, Quality of Life, and Family Perceptions for Outpatient vs Inpatient Neutropenia Management After Chemotherapy for Pediatric Acute Myeloid Leukemia.儿科急性髓细胞白血病化疗后门诊与住院中性粒细胞减少症管理的医疗结局、生活质量和家庭认知比较。

JAMA Netw Open. 2021 Oct 1;4(10):e2128385. doi: 10.1001/jamanetworkopen.2021.28385.

Systematic Review of Approaches to Preserve Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine.临床医学中存在时间数据集偏移时保留机器学习性能的方法的系统评价。

Appl Clin Inform. 2021 Aug;12(4):808-815. doi: 10.1055/s-0041-1735184. Epub 2021 Sep 1.

External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients.在住院患者中验证广泛实施的专有脓毒症预测模型的外部有效性。

JAMA Intern Med. 2021 Aug 1;181(8):1065-1070. doi: 10.1001/jamainternmed.2021.2626.

External validation of prognostic models: what, why, how, when and where?预后模型的外部验证：是什么、为什么、如何、何时以及何地？

Clin Kidney J. 2020 Nov 24;14(1):49-58. doi: 10.1093/ckj/sfaa188. eCollection 2021 Jan.

Cancer Statistics, 2021.癌症统计数据，2021.

CA Cancer J Clin. 2021 Jan;71(1):7-33. doi: 10.3322/caac.21654. Epub 2021 Jan 12.

Risk-Adapted Preemptive Tocilizumab to Prevent Severe Cytokine Release Syndrome After CTL019 for Pediatric B-Cell Acute Lymphoblastic Leukemia: A Prospective Clinical Trial.风险适应型 CTL019 序贯托珠单抗预防儿童 B 细胞急性淋巴细胞白血病患者严重细胞因子释放综合征的前瞻性临床试验

J Clin Oncol. 2021 Mar 10;39(8):920-930. doi: 10.1200/JCO.20.02477. Epub 2021 Jan 8.

Application of machine learning in the management of acute myeloid leukemia: current practice and future prospects.机器学习在急性髓系白血病管理中的应用：现状与展望。

Blood Adv. 2020 Dec 8;4(23):6077-6085. doi: 10.1182/bloodadvances.2020002997.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验