Department of Cancer and DNA Damage Responses, Life Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Rd, Berkeley 94720, CA, USA ; Current affiliation: Department of Medicine, Division of Oncology, The Genome Institute, Washington University, Campus Box 8501, 4444 Forest Park Ave, St. Louis 63108, MO, USA.
Department of Cancer and DNA Damage Responses, Life Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Rd, Berkeley 94720, CA, USA ; Current affiliation: Sequenta Inc., 400 East Jamie Court, Suite 301, South San Francisco 94080, CA, USA.
Genome Med. 2013 Oct 11;5(10):92. doi: 10.1186/gm496. eCollection 2013.
Systemic chemotherapy in the adjuvant setting can cure breast cancer in some patients that would otherwise recur with incurable, metastatic disease. However, since only a fraction of patients would have recurrence after surgery alone, the challenge is to stratify high-risk patients (who stand to benefit from systemic chemotherapy) from low-risk patients (who can safely be spared treatment related toxicities and costs).
We focus here on risk stratification in node-negative, ER-positive, HER2-negative breast cancer. We use a large database of publicly available microarray datasets to build a random forests classifier and develop a robust multi-gene mRNA transcription-based predictor of relapse free survival at 10 years, which we call the Random Forests Relapse Score (RFRS). Performance was assessed by internal cross-validation, multiple independent data sets, and comparison to existing algorithms using receiver-operating characteristic and Kaplan-Meier survival analysis. Internal redundancy of features was determined using k-means clustering to define optimal signatures with smaller numbers of primary genes, each with multiple alternates.
Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704, which was comparable to or better than those reported previously or obtained by applying existing methods to our dataset. Three risk groups with probability cutoffs for low, intermediate, and high-risk were defined. Survival analysis determined a highly significant difference in relapse rate between these risk groups. Validation of the models against independent test datasets showed highly similar results. Smaller 17-gene and 8-gene optimized models were also developed with minimal reduction in performance. Furthermore, the signature was shown to be almost equally effective on both hormone-treated and untreated patients.
RFRS allows flexibility in both the number and identity of genes utilized from thousands to as few as 17 or eight genes, each with multiple alternatives. The RFRS reports a probability score strongly correlated with risk of relapse. This score could therefore be used to assign systemic chemotherapy specifically to those high-risk patients most likely to benefit from further treatment.
辅助化疗可以治愈一些原本会因不可治愈的转移性疾病而复发的乳腺癌患者。然而,由于只有一部分患者在手术后会复发,因此挑战在于将高危患者(从系统化疗中获益)与低危患者(可以安全避免治疗相关毒性和费用)区分开来。
我们在这里关注的是淋巴结阴性、ER 阳性、HER2 阴性乳腺癌的风险分层。我们使用了一个大型公开微阵列数据集数据库来构建随机森林分类器,并开发了一个稳健的基于多基因 mRNA 转录的 10 年无复发生存预测因子,我们称之为随机森林复发评分(RFRS)。通过内部交叉验证、多个独立数据集以及与使用接收者操作特征和 Kaplan-Meier 生存分析的现有算法进行比较来评估性能。通过 k-means 聚类确定特征的内部冗余,以定义具有较少主要基因(每个基因都有多个替代物)的最佳签名。
对训练数据中初始(全基因集)模型的内部 OOB 交叉验证报告了 0.704 的 ROC AUC,与之前报告的或通过将现有方法应用于我们的数据集获得的 AUC 相当或更好。定义了低、中、高风险的三个风险组,风险概率截断值。生存分析确定了这些风险组之间复发率的显著差异。对独立测试数据集的模型验证显示出非常相似的结果。较小的 17 基因和 8 基因优化模型也具有最小的性能降低。此外,该特征在接受激素治疗和未接受治疗的患者中几乎同样有效。
RFRS 允许从数千个基因到多达 17 个或 8 个基因(每个基因都有多个替代物)的基因数量和身份具有灵活性。RFRS 报告了与复发风险强烈相关的概率评分。因此,该评分可用于将系统化疗专门分配给那些最有可能从进一步治疗中获益的高危患者。