Data Science Institute, University of Galway, University Road, H91 TK33, Co. Galway, Ireland.
Data Science Institute, University of Galway, University Road, H91 TK33, Co. Galway, Ireland.
Comput Biol Med. 2024 May;174:108398. doi: 10.1016/j.compbiomed.2024.108398. Epub 2024 Apr 3.
The recurrence of low-stage lung cancer poses a challenge due to its unpredictable nature and diverse patient responses to treatments. Personalized care and patient outcomes heavily rely on early relapse identification, yet current predictive models, despite their potential, lack comprehensive genetic data. This inadequacy fuels our research focus-integrating specific genetic information, such as pathway scores, into clinical data. Our aim is to refine machine learning models for more precise relapse prediction in early-stage non-small cell lung cancer. To address the scarcity of genetic data, we employ imputation techniques, leveraging publicly available datasets such as The Cancer Genome Atlas (TCGA), integrating pathway scores into our patient cohort from the Cancer Long Survivor Artificial Intelligence Follow-up (CLARIFY) project. Through the integration of imputed pathway scores from the TCGA dataset with clinical data, our approach achieves notable strides in predicting relapse among a held-out test set of 200 patients. By training machine learning models on enriched knowledge graph data, inclusive of triples derived from pathway score imputation, we achieve a promising precision of 82% and specificity of 91%. These outcomes highlight the potential of our models as supplementary tools within tumour, node, and metastasis (TNM) classification systems, offering improved prognostic capabilities for lung cancer patients. In summary, our research underscores the significance of refining machine learning models for relapse prediction in early-stage non-small cell lung cancer. Our approach, centered on imputing pathway scores and integrating them with clinical data, not only enhances predictive performance but also demonstrates the promising role of machine learning in anticipating relapse and ultimately elevating patient outcomes.
由于低分期肺癌具有不可预测的性质和患者对治疗反应的多样性,因此其复发是一个挑战。个性化护理和患者的结果严重依赖于早期复发的识别,但目前的预测模型尽管有其潜力,但缺乏全面的遗传数据。这种不足促使我们的研究重点是——将特定的遗传信息(如通路评分)纳入临床数据中。我们的目标是改进机器学习模型,以更精确地预测早期非小细胞肺癌的复发。为了解决遗传数据的稀缺性,我们采用了插补技术,利用公共可用数据集,如癌症基因组图谱(TCGA),将通路评分集成到我们来自癌症长期幸存者人工智能随访(CLARIFY)项目的患者队列中。通过将 TCGA 数据集的插补通路评分与临床数据相结合,我们的方法在 200 名患者的独立测试集中的复发预测方面取得了显著进展。通过在包含来自通路评分插补的三元组的丰富知识图谱数据上训练机器学习模型,我们实现了 82%的准确率和 91%的特异性。这些结果突显了我们的模型作为肿瘤、淋巴结和转移(TNM)分类系统内的辅助工具的潜力,为肺癌患者提供了改进的预后能力。总之,我们的研究强调了改进机器学习模型在早期非小细胞肺癌复发预测中的重要性。我们的方法以插补通路评分并将其与临床数据相结合为中心,不仅提高了预测性能,还展示了机器学习在预测复发和最终提高患者预后方面的有前途的作用。