Unité de Bioinfomique, Institut de Cancérologie de L'Ouest, Bd Jacques Monod, 44805, Saint Herblain Cedex, France; SIRIC ILIAD, Nantes, Angers, France.
Unité de Bioinfomique, Institut de Cancérologie de L'Ouest, Bd Jacques Monod, 44805, Saint Herblain Cedex, France; SIRIC ILIAD, Nantes, Angers, France; Ecole Centrale de Nantes, 1 Rue de La Noë, 44300, Nantes, France; Laboratoire de Mathématiques Jean Leray, BP 92208, 2 Rue de La Houssinière, 44322, Nantes Cedex 03, France.
Comput Biol Med. 2021 Feb;129:104171. doi: 10.1016/j.compbiomed.2020.104171. Epub 2020 Dec 9.
Triple-negative breast cancer (TNBC) heterogeneity represents one of the main obstacles to precision medicine for this disease. Recent concordant transcriptomics studies have shown that TNBC could be divided into at least three subtypes with potential therapeutic implications. Although a few studies have been conducted to predict TNBC subtype using transcriptomics data, the subtyping was partially sensitive and limited by batch effect and dependence on a given dataset, which may penalize the switch to routine diagnostic testing. Therefore, we sought to build an absolute predictor (i.e., intra-patient diagnosis) based on machine learning algorithms with a limited number of probes. To that end, we started by introducing probe binary comparison for each patient (indicators). We based the predictive analysis on this transformed data. Probe selection was first involved combining both filter and wrapper methods for variable selection using cross-validation. We tested three prediction models (random forest, gradient boosting [GB], and extreme gradient boosting) using this optimal subset of indicators as inputs. Nested cross-validation consistently allowed us to choose the best model. The results showed that the fifty selected indicators highlighted the biological characteristics associated with each TNBC subtype. The GB based on this subset of indicators performs better than other models.
三阴性乳腺癌(TNBC)的异质性是精准医学治疗该疾病的主要障碍之一。最近一致的转录组学研究表明,TNBC 至少可以分为三个亚型,具有潜在的治疗意义。尽管有一些研究使用转录组学数据来预测 TNBC 亚型,但这种亚型分类的敏感性有限,受到批次效应和对特定数据集的依赖性的限制,这可能会影响到常规诊断测试的转换。因此,我们试图基于机器学习算法构建一个基于少数探针的绝对预测器(即患者内诊断)。为此,我们首先为每个患者引入探针二进制比较(指标)。我们基于此转换数据进行预测分析。首先,我们使用交叉验证结合过滤和包装器方法来选择预测分析中的变量。我们使用此最佳指标子集作为输入来测试三种预测模型(随机森林、梯度提升[GB]和极端梯度提升)。嵌套交叉验证一致允许我们选择最佳模型。结果表明,五十个选定的指标突出了与每个 TNBC 亚型相关的生物学特征。基于此指标子集的 GB 表现优于其他模型。