一种用于罕见病的可解释机器学习框架：以小儿白血病感染风险分层为例的研究。

An Interpretable Machine Learning Framework for Rare Disease: A Case Study to Stratify Infection Risk in Pediatric Leukemia.

作者信息

Al-Hussaini Irfan, White Brandon, Varmeziar Armon, Mehra Nidhi, Sanchez Milagro, Lee Judy, DeGroote Nicholas P, Miller Tamara P, Mitchell Cassie S

机构信息

Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA.

Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.

出版信息

J Clin Med. 2024 Mar 20;13(6):1788. doi: 10.3390/jcm13061788.

DOI:10.3390/jcm13061788

PMID:38542012

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10970787/

Abstract

: Datasets on rare diseases, like pediatric acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL), have small sample sizes that hinder machine learning (ML). The objective was to develop an interpretable ML framework to elucidate actionable insights from small tabular rare disease datasets. : The comprehensive framework employed optimized data imputation and sampling, supervised and unsupervised learning, and literature-based discovery (LBD). The framework was deployed to assess treatment-related infection in pediatric AML and ALL. : An interpretable decision tree classified the risk of infection as either "high risk" or "low risk" in pediatric ALL ( = 580) and AML ( = 132) with accuracy of ∼79%. Interpretable regression models predicted the discrete number of developed infections with a mean absolute error (MAE) of 2.26 for bacterial infections and an MAE of 1.29 for viral infections. Features that best explained the development of infection were the chemotherapy regimen, cancer cells in the central nervous system at initial diagnosis, chemotherapy course, leukemia type, Down syndrome, race, and National Cancer Institute risk classification. Finally, SemNet 2.0, an open-source LBD software that links relationships from 33+ million PubMed articles, identified additional features for the prediction of infection, like glucose, iron, neutropenia-reducing growth factors, and systemic lupus erythematosus (SLE). : The developed ML framework enabled state-of-the-art, interpretable predictions using rare disease tabular datasets. ML model performance baselines were successfully produced to predict infection in pediatric AML and ALL.

摘要

罕见病数据集，如小儿急性髓细胞白血病（AML）和急性淋巴细胞白血病（ALL），样本量较小，这阻碍了机器学习（ML）。目标是开发一个可解释的ML框架，以从小型表格型罕见病数据集中阐明可采取行动的见解。：综合框架采用了优化的数据插补和采样、监督和无监督学习以及基于文献的发现（LBD）。该框架被用于评估小儿AML和ALL中与治疗相关的感染。：一个可解释的决策树将小儿ALL（n = 580）和AML（n = 132）的感染风险分为“高风险”或“低风险”，准确率约为79%。可解释的回归模型预测了发生感染的离散数量，细菌感染的平均绝对误差（MAE）为2.26，病毒感染的MAE为1.29。最能解释感染发生的特征是化疗方案、初诊时中枢神经系统中的癌细胞、化疗疗程、白血病类型、唐氏综合征、种族和美国国立癌症研究所风险分类。最后，SemNet 2.0，一款将来自3300多万篇PubMed文章的关系联系起来的开源LBD软件，识别出了用于预测感染的其他特征，如葡萄糖、铁、减少中性粒细胞减少的生长因子和系统性红斑狼疮（SLE）。：所开发的ML框架能够使用罕见病表格数据集进行最先进的、可解释的预测。成功生成了ML模型性能基线，以预测小儿AML和ALL中的感染。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09f1/10970787/cf33f1591f50/jcm-13-01788-g0A1.jpg

相似文献

An Interpretable Machine Learning Framework for Rare Disease: A Case Study to Stratify Infection Risk in Pediatric Leukemia.

J Clin Med. 2024 Mar 20;13(6):1788. doi: 10.3390/jcm13061788.

Exploring Pattern of Relapse in Pediatric Patients with Acute Lymphocytic Leukemia and Acute Myeloid Leukemia Undergoing Stem Cell Transplant Using Machine Learning Methods.

J Clin Med. 2024 Jul 10;13(14):4021. doi: 10.3390/jcm13144021.

MAGIC-DR: An interpretable machine-learning guided approach for acute myeloid leukemia measurable residual disease analysis.

Cytometry B Clin Cytom. 2024 Jul;106(4):239-251. doi: 10.1002/cyto.b.22168. Epub 2024 Feb 28.

Leveraging machine learning to identify acute myeloid leukemia patients and their chemotherapy regimens in an administrative database.

Pediatr Blood Cancer. 2023 May;70(5):e30260. doi: 10.1002/pbc.30260. Epub 2023 Feb 23.

Interpretable machine learning models for hospital readmission prediction: a two-step extracted regression tree approach.

BMC Med Inform Decis Mak. 2023 Jun 5;23(1):104. doi: 10.1186/s12911-023-02193-5.

AutoScore-Imbalance: An interpretable machine learning tool for development of clinical scores with rare events data.

J Biomed Inform. 2022 May;129:104072. doi: 10.1016/j.jbi.2022.104072. Epub 2022 Apr 11.

Evaluation of a machine-learning model based on laboratory parameters for the prediction of acute leukaemia subtypes: a multicentre model development and validation study in France.

Lancet Digit Health. 2024 May;6(5):e323-e333. doi: 10.1016/S2589-7500(24)00044-X.

An Interpretable Longitudinal Preeclampsia Risk Prediction Using Machine Learning.

medRxiv. 2023 Aug 16:2023.08.16.23293946. doi: 10.1101/2023.08.16.23293946.

Intelligent diagnosis of Kawasaki disease from real-world data using interpretable machine learning models.

Hellenic J Cardiol. 2025 Jan-Feb;81:38-48. doi: 10.1016/j.hjc.2024.08.003. Epub 2024 Aug 10.

XGBoost-SHAP-based interpretable diagnostic framework for alzheimer's disease.

BMC Med Inform Decis Mak. 2023 Jul 25;23(1):137. doi: 10.1186/s12911-023-02238-9.

引用本文的文献

Applications of Artificial Intelligence in Acute Promyelocytic Leukemia: An Avenue of Opportunities? A Systematic Review.

J Clin Med. 2025 Mar 1;14(5):1670. doi: 10.3390/jcm14051670.

Cross-Domain Text Mining of Pathophysiological Processes Associated with Diabetic Kidney Disease.

Int J Mol Sci. 2024 Apr 19;25(8):4503. doi: 10.3390/ijms25084503.

本文引用的文献

BioSift: A Dataset for Filtering Biomedical Abstracts for Drug Repurposing and Clinical Meta-Analysis.

Int ACM SIGIR Conf Res Dev Inf Retr. 2023 Jul;2023:2913-2923. doi: 10.1145/3539618.3591897. Epub 2023 Jul 18.

TOWARDS INTERPRETABLE SEIZURE DETECTION USING WEARABLES.

Proc IEEE Int Conf Acoust Speech Signal Process. 2023 Jun;2023. doi: 10.1109/icassp49357.2023.10097091. Epub 2023 May 5.

sEBM: Scaling Event Based Models to Predict Disease Progression via Implicit Biomarker Selection and Clustering.

Inf Process Med Imaging. 2023 Jun;13939:208-221. doi: 10.1007/978-3-031-34048-2_17. Epub 2023 Jun 8.

Assessment of the Prevalence of Infections in Pediatric Patients With Acute Lymphoblastic Leukemia.

Cureus. 2023 Oct 11;15(10):e46837. doi: 10.7759/cureus.46837. eCollection 2023 Oct.

Literature-Based Discovery to Elucidate the Biological Links between Resistant Hypertension and COVID-19.

Biology (Basel). 2023 Sep 21;12(9):1269. doi: 10.3390/biology12091269.

SeizFt: Interpretable Machine Learning for Seizure Detection Using Wearables.

Bioengineering (Basel). 2023 Aug 2;10(8):918. doi: 10.3390/bioengineering10080918.

Literature-Based Discovery Predicts Antihistamines Are a Promising Repurposed Adjuvant Therapy for Parkinson's Disease.

Int J Mol Sci. 2023 Aug 2;24(15):12339. doi: 10.3390/ijms241512339.

Management of Down Syndrome-Associated Leukemias: A Review.

JAMA Oncol. 2023 Sep 1;9(9):1283-1290. doi: 10.1001/jamaoncol.2023.2163.

Machine learning in metastatic cancer research: Potentials, possibilities, and prospects.

Comput Struct Biotechnol J. 2023 Mar 29;21:2454-2470. doi: 10.1016/j.csbj.2023.03.046. eCollection 2023.

Transfer learning for non-image data in clinical research: A scoping review.

PLOS Digit Health. 2022 Feb 17;1(2):e0000014. doi: 10.1371/journal.pdig.0000014. eCollection 2022 Feb.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于罕见病的可解释机器学习框架：以小儿白血病感染风险分层为例的研究。

An Interpretable Machine Learning Framework for Rare Disease: A Case Study to Stratify Infection Risk in Pediatric Leukemia.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献