Department of Mathematics and Statistics, University of Victoria, Victoria, BC, Canada.
Department of Mathematics and Statistics, University of Saskatchewan, Saskatoon, SK, Canada.
BMC Med Res Methodol. 2024 Apr 8;24(1):83. doi: 10.1186/s12874-024-02185-7.
The timing of treating cancer patients is an essential factor in the efficacy of treatment. So, patients who will not respond to current therapy should receive a different treatment as early as possible. Machine learning models can be built to classify responders and nonresponders. Such classification models predict the probability of a patient being a responder. Most methods use a probability threshold of 0.5 to convert the probabilities into binary group membership. However, the cutoff of 0.5 is not always the optimal choice.
In this study, we propose a novel data-driven approach to select a better cutoff value based on the optimal cross-validation technique. To illustrate our novel method, we applied it to three clinical trial datasets of small-cell lung cancer patients. We used two different datasets to build a scoring system to segment patients. Then the models were applied to segment patients into the test data.
We found that, in test data, the predicted responders and non-responders had significantly different long-term survival outcomes. Our proposed novel method segments patients better than the standard approach using a cutoff of 0.5. Comparing clinical outcomes of responders versus non-responders, our novel method had a p-value of 0.009 with a hazard ratio of 0.668 for grouping patients using the Cox proportion hazard model and a p-value of 0.011 using the accelerated failure time model which approved a significant difference between responders and non-responders. In contrast, the standard approach had a p-value of 0.194 with a hazard ratio of 0.823 using the Cox proportion hazard model and a p-value of 0.240 using the accelerated failure time model indicating the responders and non-responders do not differ significantly in survival.
In summary, our novel prediction method can successfully segment new patients into responders and non-responders. Clinicians can use our prediction to decide if a patient should receive a different treatment or stay with the current treatment.
治疗癌症患者的时机是治疗效果的一个重要因素。因此,那些对当前治疗没有反应的患者应尽早接受不同的治疗。可以建立机器学习模型来对应答者和无应答者进行分类。此类分类模型可预测患者成为应答者的概率。大多数方法使用概率阈值 0.5 将概率转换为二进制组隶属关系。然而,0.5 的截止值并不总是最佳选择。
在本研究中,我们提出了一种新的数据驱动方法,基于最优交叉验证技术选择更好的截止值。为了说明我们的新方法,我们将其应用于小细胞肺癌患者的三个临床试验数据集。我们使用两个不同的数据集来构建一个评分系统以对患者进行分段。然后将模型应用于将患者分段到测试数据中。
我们发现,在测试数据中,预测的应答者和无应答者的长期生存结果有显著差异。与使用截止值 0.5 的标准方法相比,我们提出的新方法更好地对患者进行了分段。将应答者与无应答者的临床结果进行比较,我们的新方法使用 Cox 比例风险模型进行分组的 p 值为 0.009,风险比为 0.668,使用加速失效时间模型的 p 值为 0.011,这表明应答者和无应答者之间存在显著差异。相比之下,标准方法使用 Cox 比例风险模型的 p 值为 0.194,风险比为 0.823,使用加速失效时间模型的 p 值为 0.240,这表明应答者和无应答者在生存方面没有显著差异。
总之,我们的新预测方法可以成功地将新患者分为应答者和无应答者。临床医生可以使用我们的预测来决定患者是否应接受不同的治疗或继续接受当前的治疗。