Department of Pathology, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, United States of America.
PLoS Comput Biol. 2013;9(1):e1002875. doi: 10.1371/journal.pcbi.1002875. Epub 2013 Jan 24.
A major goal in translational cancer research is to identify biological signatures driving cancer progression and metastasis. A common technique applied in genomics research is to cluster patients using gene expression data from a candidate prognostic gene set, and if the resulting clusters show statistically significant outcome stratification, to associate the gene set with prognosis, suggesting its biological and clinical importance. Recent work has questioned the validity of this approach by showing in several breast cancer data sets that "random" gene sets tend to cluster patients into prognostically variable subgroups. This work suggests that new rigorous statistical methods are needed to identify biologically informative prognostic gene sets. To address this problem, we developed Significance Analysis of Prognostic Signatures (SAPS) which integrates standard prognostic tests with a new prognostic significance test based on stratifying patients into prognostic subtypes with random gene sets. SAPS ensures that a significant gene set is not only able to stratify patients into prognostically variable groups, but is also enriched for genes showing strong univariate associations with patient prognosis, and performs significantly better than random gene sets. We use SAPS to perform a large meta-analysis (the largest completed to date) of prognostic pathways in breast and ovarian cancer and their molecular subtypes. Our analyses show that only a small subset of the gene sets found statistically significant using standard measures achieve significance by SAPS. We identify new prognostic signatures in breast and ovarian cancer and their corresponding molecular subtypes, and we show that prognostic signatures in ER negative breast cancer are more similar to prognostic signatures in ovarian cancer than to prognostic signatures in ER positive breast cancer. SAPS is a powerful new method for deriving robust prognostic biological signatures from clinically annotated genomic datasets.
癌症转化研究的一个主要目标是确定驱动癌症进展和转移的生物学特征。在基因组学研究中,一种常用的技术是使用候选预后基因集的基因表达数据对患者进行聚类,如果得到的聚类在统计学上显示出明显的预后分层,则将基因集与预后相关联,表明其具有生物学和临床重要性。最近的研究通过在几个乳腺癌数据集上表明“随机”基因集往往会将患者聚类为预后可变的亚组,对这种方法的有效性提出了质疑。这项工作表明,需要新的严格的统计方法来识别具有生物学意义的预后基因集。为了解决这个问题,我们开发了预后标志物显著性分析(Significance Analysis of Prognostic Signatures,SAPS),它将标准预后测试与基于用随机基因集将患者分层为预后亚型的新预后显著性测试相结合。SAPS 确保显著的基因集不仅能够将患者分层为预后可变的组,而且还富集了与患者预后具有强烈单变量关联的基因,并且比随机基因集表现更好。我们使用 SAPS 对乳腺癌和卵巢癌及其分子亚型中的预后途径进行了大规模的荟萃分析(迄今为止最大规模的完成分析)。我们的分析表明,仅使用标准方法发现具有统计学意义的基因集中的一小部分子集通过 SAPS 达到了显著性。我们在乳腺癌和卵巢癌及其相应的分子亚型中发现了新的预后标志物,并表明 ER 阴性乳腺癌中的预后标志物与卵巢癌中的预后标志物比 ER 阳性乳腺癌中的预后标志物更相似。SAPS 是一种从临床注释基因组数据集中提取稳健预后生物学标志物的强大新方法。