Department of Informatics, Faculty of Natural and Mathematical Sciences, King's College London, London, UK.
St. John's Institute of Dermatology, School of Basic and Medical Biosciences, King's College London, London, UK.
Br J Cancer. 2021 Aug;125(5):748-758. doi: 10.1038/s41416-021-01455-1. Epub 2021 Jun 15.
Prognostic stratification of breast cancers remains a challenge to improve clinical decision making. We employ machine learning on breast cancer transcriptomics from multiple studies to link the expression of specific genes to histological grade and classify tumours into a more or less aggressive prognostic type.
Microarray data of 5031 untreated breast tumours spanning 33 published datasets and corresponding clinical data were integrated. A machine learning model based on gradient boosted trees was trained on histological grade-1 and grade-3 samples. The resulting predictive model (Cancer Grade Model, CGM) was applied on samples of grade-2 and unknown-grade (3029) for prognostic risk classification.
A 70-gene signature for assessing clinical risk was identified and was shown to be 90% accurate when tested on known histological-grade samples. The predictive framework was validated through survival analysis and showed robust prognostic performance. CGM was cross-referenced with existing genomic tests and demonstrated the competitive predictive power of tumour risk.
CGM is able to classify tumours into better-defined prognostic categories without employing information on tumour size, stage, or subgroups. The model offers means to improve prognosis and support the clinical decision and precision treatments, thereby potentially contributing to preventing underdiagnosis of high-risk tumours and minimising over-treatment of low-risk disease.
乳腺癌的预后分层仍然是改善临床决策的一个挑战。我们利用来自多个研究的乳腺癌转录组学的机器学习,将特定基因的表达与组织学分级联系起来,并将肿瘤分类为侵袭性更高或更低的预后类型。
整合了 5031 例未经治疗的乳腺癌的微阵列数据,涵盖了 33 个已发表的数据集和相应的临床数据。基于梯度提升树的机器学习模型在组织学分级 1 级和 3 级样本上进行了训练。所得预测模型(癌症分级模型,CGM)应用于分级 2 级和未知分级(3029)的样本进行预后风险分类。
确定了一个用于评估临床风险的 70 个基因特征,在已知组织学分级样本上测试时准确率达到 90%。通过生存分析验证了预测框架的稳健预后性能。CGM 与现有的基因组测试进行了交叉引用,证明了肿瘤风险的竞争预测能力。
CGM 能够在不使用肿瘤大小、分期或亚组信息的情况下,将肿瘤分类为更明确的预后类别。该模型提供了改善预后和支持临床决策和精准治疗的手段,从而有可能有助于避免高危肿瘤的漏诊,并减少低危疾病的过度治疗。