Park Seongyong, Kim Seonkyu, Jiang Peng
Cancer Data Science Lab, CCR, NCI, NIH, Bethesda, MD, 20892, USA.
Aging Convergence Research Center, KRIBB, Daejeon, 34141, Republic of Korea.
Sci Rep. 2025 Aug 1;15(1):28044. doi: 10.1038/s41598-025-14166-x.
Bulk RNA-seq-based prediction of immune checkpoint blockade (ICB) responses has been extensively studied to distinguish responders from non-responders. However, cohort heterogeneity remains a major challenge, hindering the robustness and generalizability of predictive models across diverse RNA-seq datasets. In this study, we present IC2Bert, a novel model that employs masked gene expression pretraining combined with domain-specific supervised fine-tuning to enhance predictive robustness across heterogeneous ICB response cohorts. To ensure an objective evaluation, we assessed the model's performance using a Leave-One-Dataset-Out Cross-Validation (LODOCV) approach. IC2Bert demonstrated significantly improved predictive accuracy and robustness compared to existing methods, effectively addressing the challenges posed by cohort heterogeneity. The IC2Bert model and its source code are publicly available on GitHub: https://github.com/data2intelligence/ic2bert .
基于批量RNA测序预测免疫检查点阻断(ICB)反应的研究已广泛开展,旨在区分反应者与无反应者。然而,队列异质性仍然是一个重大挑战,阻碍了预测模型在不同RNA测序数据集上的稳健性和通用性。在本研究中,我们提出了IC2Bert,这是一种新型模型,它采用掩码基因表达预训练并结合特定领域的监督微调,以增强在异质ICB反应队列中的预测稳健性。为确保客观评估,我们使用留一数据集交叉验证(LODOCV)方法评估了该模型的性能。与现有方法相比,IC2Bert表现出显著提高的预测准确性和稳健性,有效应对了队列异质性带来的挑战。IC2Bert模型及其源代码可在GitHub上公开获取:https://github.com/data2intelligence/ic2bert 。