Toya Takashi, Nakajima Yujiro, Hara Konan, Kaito Satoshi, Nishida Tetsuya, Uchida Naoyuki, Shingai Naoki, Takeda Wataru, Ozawa Yukiyasu, Tanaka Masatsugu, Yoshihara Satoshi, Katayama Yuta, Eto Tetsuya, Sawa Masashi, Ota Shuichi, Ohigashi Hiroyuki, Takada Satoru, Kataoka Keisuke, Kanda Junya, Fukuda Takahiro, Ogata Masao, Taguchi Ayumi, Atsuta Yoshiko
Hematology Division Tokyo Metropolitan Komagome Hospital Tokyo Japan.
Department of Radiological Sciences Komazawa University Graduate School Setagaya Japan.
EJHaem. 2025 Aug 9;6(4):e70117. doi: 10.1002/jha2.70117. eCollection 2025 Aug.
Clinically significant cytomegalovirus infection (csCMVi) and non-relapse mortality (NRM) remain serious concerns after allogeneic hematopoietic stem cell transplantation (HSCT), but subpopulations with heterogeneous treatment effects (HTEs) is unclear. Although machine learning (ML) algorithms have recently been applied to HSCT, the methodology has not been well elucidated.
We developed a ML algorithm which combined weighting procedures and left-truncated and right-censored trees based on classification and regression tree algorithms to fit survival data with time-varying covariates and competing risks comprehensively. The Japanese large-scale registry data were applied to the algorithm to explore subpopulations with HTEs of csCMVi and NRM after HSCT. Its performance was evaluated by comparing their c-indices with those of the conventional Fine-Gray model.
A total of 10,480 patients were divided into training (75%) and test (25%) cohorts; the training cohort was used to develop the ML model. Using the model, patient CMV-seropositivity, patient age, and acute graft-versus-host disease were identified as important predictors of csCMVi. In addition, the patients were successfully classified by the estimated cumulative incidence of csCMVi, which varied from 22.7% at 0.5 year to 82.7%. This model also depicts interpretable survival trees in various settings. Similarly, the patients can be also classified based on the estimated 3-year NRM, which varied from 8.0% to 48.5%. C-indices of the ML and the Fine-Gray model using the test cohort showed comparable performance.
A reliable, explainable, and interpretable ML model was developed to explore subpopulations with HTEs of csCMVi and NRM after HSCT. : The authors have confirmed clinical trial registration is not needed for this submission.
在异基因造血干细胞移植(HSCT)后,具有临床意义的巨细胞病毒感染(csCMVi)和非复发死亡率(NRM)仍然是严重问题,但具有异质性治疗效果(HTEs)的亚群尚不清楚。尽管机器学习(ML)算法最近已应用于HSCT,但其方法尚未得到充分阐明。
我们开发了一种ML算法,该算法基于分类和回归树算法,结合加权程序以及左截断和右删失树,以全面拟合具有随时间变化的协变量和竞争风险的生存数据。将日本大规模登记数据应用于该算法,以探索HSCT后csCMVi和NRM的HTEs亚群。通过将其c指数与传统的Fine-Gray模型的c指数进行比较来评估其性能。
总共10480名患者被分为训练组(75%)和测试组(25%);训练组用于开发ML模型。使用该模型,患者的CMV血清阳性、患者年龄和急性移植物抗宿主病被确定为csCMVi的重要预测因素。此外,根据csCMVi的估计累积发病率成功地对患者进行了分类,其从0.5年时的22.7%到82.7%不等。该模型还描绘了各种情况下可解释的生存树。同样,患者也可以根据估计的3年NRM进行分类,其从8.0%到48.5%不等。使用测试组的ML和Fine-Gray模型的c指数显示出可比的性能。
开发了一种可靠、可解释且具有可解释性的ML模型,以探索HSCT后csCMVi和NRM的HTEs亚群。作者已确认本提交不需要临床试验注册。