Bueno-Fortes Santiago, Berral-Gonzalez Alberto, Sánchez-Santos José Manuel, Martin-Merino Manuel, De Las Rivas Javier
Cancer Research Center (CiC-IMBCC, CSIC/USAL and IBSAL), Consejo Superior de Investigaciones Científicas (CSIC) and University of Salamanca (USAL), Salamanca 37007, Spain.
Department of Statistics, University of Salamanca (USAL), Salamanca 37008, Spain.
Bioinform Adv. 2023 Mar 22;3(1):vbad037. doi: 10.1093/bioadv/vbad037. eCollection 2023.
Modern genomic technologies allow us to perform genome-wide analysis to find gene markers associated with the risk and survival in cancer patients. Accurate risk prediction and patient stratification based on robust gene signatures is a key path forward in personalized treatment and precision medicine. Several authors have proposed the identification of gene signatures to assign risk in patients with breast cancer (BRCA), and some of these signatures have been implemented within commercial platforms in the clinic, such as Oncotype and Prosigna. However, these platforms are black boxes in which the influence of selected genes as survival markers is unclear and where the risk scores provided cannot be clearly related to the standard clinicopathological tumor markers obtained by immunohistochemistry (IHC), which guide clinical and therapeutic decisions in breast cancer.
Here, we present a framework to discover a robust list of gene expression markers associated with survival that can be biologically interpreted in terms of the three main biomolecular factors (IHC clinical markers: ER, PR and HER2) that define clinical outcome in BRCA. To test and ensure the reproducibility of the results, we compiled and analyzed two independent datasets with a large number of tumor samples (1024 and 879) that include full genome-wide expression profiles and survival data. Using these two cohorts, we obtained a robust subset of gene survival markers that correlate well with the major IHC clinical markers used in breast cancer. The geneset of survival markers that we identify (which includes 34 genes) significantly improves the risk prediction provided by the genesets included in the commercial platforms: Oncotype (16 genes) and Prosigna (50 genes, i.e. PAM50). Furthermore, some of the genes identified have recently been proposed in the literature as new prognostic markers and may deserve more attention in current clinical trials to improve breast cancer risk prediction.
All data integrated and analyzed in this research will be available on GitHub (https://github.com/jdelasrivas-lab/breastcancersurvsign), including the R scripts and protocols used for the analyses.
Supplementary data are available at online.
现代基因组技术使我们能够进行全基因组分析,以寻找与癌症患者风险和生存相关的基因标志物。基于可靠的基因特征进行准确的风险预测和患者分层是个性化治疗和精准医学的关键前进道路。几位作者提出了鉴定基因特征以评估乳腺癌(BRCA)患者风险的方法,其中一些特征已在临床的商业平台中实施,如Oncotype和Prosigna。然而,这些平台就像黑匣子,所选基因作为生存标志物的影响尚不清楚,并且所提供的风险评分与通过免疫组织化学(IHC)获得的标准临床病理肿瘤标志物没有明确关联,而后者指导着乳腺癌的临床和治疗决策。
在此,我们提出了一个框架,以发现与生存相关的可靠基因表达标志物列表,这些标志物可以根据定义BRCA临床结果的三个主要生物分子因素(IHC临床标志物:雌激素受体(ER)、孕激素受体(PR)和人表皮生长因子受体2(HER2))进行生物学解释。为了测试并确保结果的可重复性,我们汇编并分析了两个包含大量肿瘤样本(分别为1024个和879个)的独立数据集,这些数据集包括全基因组范围的表达谱和生存数据。利用这两个队列,我们获得了与乳腺癌中使用的主要IHC临床标志物密切相关的可靠基因生存标志物子集。我们鉴定出的生存标志物基因集(包括34个基因)显著改善了商业平台(Oncotype(16个基因)和Prosigna(50个基因,即PAM50))中基因集所提供的风险预测。此外,一些已鉴定出的基因最近在文献中被提议作为新的预后标志物,在当前的临床试验中可能值得更多关注,以改善乳腺癌风险预测。
本研究中整合和分析的所有数据将在GitHub(https://github.com/jdelasrivas-lab/breastcancersurvsign)上提供,包括用于分析的R脚本和方案。
补充数据可在网上获取。