Suppr超能文献

一种用于罕见病的可解释机器学习框架:以小儿白血病感染风险分层为例的研究。

An Interpretable Machine Learning Framework for Rare Disease: A Case Study to Stratify Infection Risk in Pediatric Leukemia.

作者信息

Al-Hussaini Irfan, White Brandon, Varmeziar Armon, Mehra Nidhi, Sanchez Milagro, Lee Judy, DeGroote Nicholas P, Miller Tamara P, Mitchell Cassie S

机构信息

Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA.

Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.

出版信息

J Clin Med. 2024 Mar 20;13(6):1788. doi: 10.3390/jcm13061788.

Abstract

: Datasets on rare diseases, like pediatric acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL), have small sample sizes that hinder machine learning (ML). The objective was to develop an interpretable ML framework to elucidate actionable insights from small tabular rare disease datasets. : The comprehensive framework employed optimized data imputation and sampling, supervised and unsupervised learning, and literature-based discovery (LBD). The framework was deployed to assess treatment-related infection in pediatric AML and ALL. : An interpretable decision tree classified the risk of infection as either "high risk" or "low risk" in pediatric ALL ( = 580) and AML ( = 132) with accuracy of ∼79%. Interpretable regression models predicted the discrete number of developed infections with a mean absolute error (MAE) of 2.26 for bacterial infections and an MAE of 1.29 for viral infections. Features that best explained the development of infection were the chemotherapy regimen, cancer cells in the central nervous system at initial diagnosis, chemotherapy course, leukemia type, Down syndrome, race, and National Cancer Institute risk classification. Finally, SemNet 2.0, an open-source LBD software that links relationships from 33+ million PubMed articles, identified additional features for the prediction of infection, like glucose, iron, neutropenia-reducing growth factors, and systemic lupus erythematosus (SLE). : The developed ML framework enabled state-of-the-art, interpretable predictions using rare disease tabular datasets. ML model performance baselines were successfully produced to predict infection in pediatric AML and ALL.

摘要

罕见病数据集,如小儿急性髓细胞白血病(AML)和急性淋巴细胞白血病(ALL),样本量较小,这阻碍了机器学习(ML)。目标是开发一个可解释的ML框架,以从小型表格型罕见病数据集中阐明可采取行动的见解。:综合框架采用了优化的数据插补和采样、监督和无监督学习以及基于文献的发现(LBD)。该框架被用于评估小儿AML和ALL中与治疗相关的感染。:一个可解释的决策树将小儿ALL(n = 580)和AML(n = 132)的感染风险分为“高风险”或“低风险”,准确率约为79%。可解释的回归模型预测了发生感染的离散数量,细菌感染的平均绝对误差(MAE)为2.26,病毒感染的MAE为1.29。最能解释感染发生的特征是化疗方案、初诊时中枢神经系统中的癌细胞、化疗疗程、白血病类型、唐氏综合征、种族和美国国立癌症研究所风险分类。最后,SemNet 2.0,一款将来自3300多万篇PubMed文章的关系联系起来的开源LBD软件,识别出了用于预测感染的其他特征,如葡萄糖、铁、减少中性粒细胞减少的生长因子和系统性红斑狼疮(SLE)。:所开发的ML框架能够使用罕见病表格数据集进行最先进的、可解释的预测。成功生成了ML模型性能基线,以预测小儿AML和ALL中的感染。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09f1/10970787/cf33f1591f50/jcm-13-01788-g0A1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验