Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada.
PLoS One. 2013 Dec 2;8(12):e82144. doi: 10.1371/journal.pone.0082144. eCollection 2013.
Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER) status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results.
To learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor.
This produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status.
Our efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions.
选择乳腺癌的适当治疗方法需要准确确定肿瘤的雌激素受体(ER)状态。然而,确定这种状态的标准,即福尔马林固定石蜡包埋样本的免疫组织化学分析,存在许多技术和可重复性问题。基于 RNA 表达的 ER 状态评估可以提供更客观、定量和可重复的测试结果。
为了学习激素受体状态的简约 RNA 分类器,我们将机器学习工具应用于 176 个冷冻乳腺癌的基因表达微阵列数据集的训练数据集,其 ER 状态通过应用 ASCO-CAP 指南对福尔马林固定肿瘤的标准化免疫组织化学测试来确定。
这产生了一个可以预测新肿瘤 ER 状态的三基因分类器,其交叉验证准确率为 93.17±2.44%。当应用于独立验证集和另外四个公共数据库时,在每个数据库上的准确率都超过了 90%。此外,我们发现该预测规则通过危险比低于基于 ER 状态免疫组织化学分析的危险比来分离患者的无复发生存曲线。
我们高效简约的分类器适用于高通量、高精度和低成本的基于 RNA 的 ER 状态评估,适合常规高通量临床使用。这种分析方法提供了一个原则性的证明,可能适用于开发其他生物标志物和疾病的有效基于 RNA 的测试。