Jeong Jae-Seung, Ju Hyunsu, Cho Chi-Hyun
Division of Artificial Intelligence Convergence Engineering, Sahmyook University, 01795, Republic of Korea.
Post-Silicon Semiconductor Institute, Korea Institute of Science and Technology, 02792, Republic of Korea.
Int J Med Sci. 2025 Apr 13;22(9):2208-2226. doi: 10.7150/ijms.109493. eCollection 2025.
This study measures expression of () and related cytokine genes in bone marrow mononuclear cells in patients with hematological malignancies, analyzing the relationship between them with an integrated framework of statistical analyses, machine learning (ML), and explainable artificial intelligence (XAI). While traditional dimensionality reduction techniques-such as principal component analysis, linear discriminant analysis, and t-distributed stochastic neighbor embedding-showed limited differentiation embedding, ML classifiers (k-Nearest Neighbors, Naïve Bayes Classifier, Random Forest, and XGBoost) successfully identified critical patterns. Notably, normalized caspase-1 counts consistently emerged as the most influential feature associated with NF-κB1 activity across disease groups, as highlighted by SHapley Additive exPlanations analyses. Systematic evaluation of ML performance on small datasets revealed that a minimum sample size of 15-24 is necessary for reliable classification outcomes, particularly in cohorts of acute myeloid leukemia and myelodysplastic syndrome. These findings underscore the pivotal role of caspase-1 to the NF-κB1 gene expression in hematologic malignancy diseases. Furthermore, this study demonstrates the feasibility of leveraging ML and XAI to derive meaningful insights from limited data, offering a robust strategy for biomarker discovery and precision medicine in rare hematological malignancies.
本研究检测血液系统恶性肿瘤患者骨髓单个核细胞中()及相关细胞因子基因的表达,运用统计分析、机器学习(ML)和可解释人工智能(XAI)的综合框架分析它们之间的关系。虽然传统的降维技术,如主成分分析、线性判别分析和t分布随机邻域嵌入,显示出有限的区分嵌入能力,但ML分类器(k近邻、朴素贝叶斯分类器、随机森林和XGBoost)成功识别出关键模式。值得注意的是,经SHapley加性解释分析突出显示,在所有疾病组中,标准化的半胱天冬酶-1计数始终是与NF-κB1活性相关的最具影响力的特征。对小数据集上ML性能的系统评估表明,为获得可靠的分类结果,最小样本量为15至24是必要的,特别是在急性髓系白血病和骨髓增生异常综合征队列中。这些发现强调了半胱天冬酶-1在血液系统恶性肿瘤疾病中对NF-κB1基因表达的关键作用。此外,本研究证明了利用ML和XAI从有限数据中获得有意义见解的可行性,为罕见血液系统恶性肿瘤中的生物标志物发现和精准医学提供了一种强有力的策略。