Oncology Data Science, Oncology R&D, AstraZeneca, Waltham, MA, USA.
Oncology Data Science, Oncology R&D, AstraZeneca, South San Francisco, CA, USA.
J Transl Med. 2024 Aug 5;22(1):726. doi: 10.1186/s12967-024-05509-9.
Accurate survival prediction for Non-Small Cell Lung Cancer (NSCLC) patients remains a significant challenge for the scientific and clinical community despite decades of advanced analytics. Addressing this challenge not only helps inform the critical aspects of clinical study design and biomarker discovery but also ensures that the 'right patient' receives the 'right treatment'. However, survival prediction is a highly complex task, given the large number of 'omics; and clinical features, as well as the high degree of freedom that drive patient survival. Prior knowledge could play a critical role in uncovering the complexity of a disease and understanding the driving factors affecting a patient's survival. We introduce a methodology for incorporating prior knowledge into machine learning-based models for prediction of patient survival through Knowledge Graphs, demonstrating the advantage of such an approach for NSCLC patients. Using data from patients treated with immuno-oncologic therapies in the POPLAR (NCT01903993) and OAK (NCT02008227) clinical trials, we found that the use of knowledge graphs yielded significantly improved hazard ratios, including in the POPLAR cohort, for models based on biomarker tumor mutation burden compared with those based on knowledge graphs. Use of a model-defined mutational 10-gene signature led to significant overall survival differentiation for both trials. We provide parameterized code for incorporating knowledge graphs into survival analyses for use by the wider scientific community.
尽管经过了几十年的高级分析,准确预测非小细胞肺癌(NSCLC)患者的生存率仍然是科学界和临床界面临的重大挑战。解决这一挑战不仅有助于为临床研究设计和生物标志物发现的关键方面提供信息,还确保“合适的患者”接受“合适的治疗”。然而,由于存在大量的“组学”和临床特征,以及驱动患者生存的高度自由度,因此生存预测是一项非常复杂的任务。先验知识在揭示疾病的复杂性和理解影响患者生存的驱动因素方面可以发挥关键作用。我们介绍了一种通过知识图谱将先验知识纳入基于机器学习的患者生存预测模型的方法,展示了这种方法在 NSCLC 患者中的优势。通过使用来自接受免疫肿瘤治疗的患者的数据,在 POPLAR(NCT01903993)和 OAK(NCT02008227)临床试验中,我们发现与基于知识图谱的模型相比,基于生物标志物肿瘤突变负担的模型使用知识图谱可显著提高风险比,包括在 POPLAR 队列中。使用模型定义的突变 10 基因特征可显著区分两个试验的总生存期。我们提供了将知识图谱纳入生存分析的参数化代码,以供更广泛的科学界使用。