Bertsimas Dimitris, Ning Catherine, Lønning Per Eystein, Baba Hideo, Endo Itaru, Burkhart Richard, Aucejo Federico N, Balzer Felix, Kreis Martin E, Margonis Georgios Antonios
Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA, USA.
Department of Clinical Science, University of Bergen, Department of Oncology, Haukeland University Hospital, Bergen, Norway.
Res Sq. 2024 Dec 12:rs.3.rs-5467577. doi: 10.21203/rs.3.rs-5467577/v1.
We present, to our knowledge, the first methodological study aimed at enhancing the prognostic power of Cox regression models, widely used in survival analysis, through optimized data selection. Our approach employs a novel two-stage mechanism: by framing the prognostic stratum matching problem intuitively, we select prognostically representative patient observations to create a more balanced training set. This enables the model to assign equal attention to distinct prognostic subgroups. We demonstrate the methodology using an observational dataset of 1,799 patients with resected colorectal cancer liver metastases, 1,197 of whom received adjuvant chemotherapy and 602 who did not. In our study, as is current standard practice, the comparator was training prognostic models on the entire cohort (referred to as "model 1"). Models trained on the untreated and treated subgroups, matched through our approach (referred to as "model 3"), showed an improvement of up to 20% in bootstrapped C-indices compared to model 1. Notably, model 3 exhibited superior calibration, with a 6- to 10-fold improvement over model 1. Additional performance metrics aligned with these findings, and robustness was confirmed through bias-corrected bootstrapping. Given the ongoing development of numerous linear prognostic models and the general applicability of our approach to any observational data, this method holds significant potential to impact biomedical research and clinical practice where prognostic models are utilized.
据我们所知,我们开展了第一项旨在通过优化数据选择来增强Cox回归模型(在生存分析中广泛使用)预后能力的方法学研究。我们的方法采用了一种新颖的两阶段机制:通过直观地构建预后分层匹配问题,我们选择具有预后代表性的患者观察数据,以创建一个更加平衡的训练集。这使得模型能够对不同的预后亚组给予同等关注。我们使用一个包含1799例接受过结直肠癌肝转移切除术患者的观察数据集来演示该方法,其中1197例接受了辅助化疗,602例未接受辅助化疗。在我们的研究中,按照当前的标准做法,对照是在整个队列上训练预后模型(称为“模型1”)。通过我们的方法进行匹配后,在未治疗和已治疗亚组上训练的模型(称为“模型3”),与模型1相比,自展C指数提高了20%。值得注意的是,模型3表现出更好的校准,比模型1提高了6至10倍。其他性能指标也与这些发现一致,并且通过偏差校正自展法证实了其稳健性。鉴于众多线性预后模型仍在不断发展,且我们的方法对任何观察数据都具有普遍适用性,该方法在利用预后模型的生物医学研究和临床实践中具有显著的潜在影响。