Laboratory of Oncology, Institute of Medicine and Experimental Biology of Cuyo (IMBECU), National Scientific and Technical Research Council (CONICET), Mendoza 5500, Argentina.
Institute of Biochemistry and Biotechnology, School of Medicine, National University of Cuyo, Mendoza 5500, Argentina.
Bioinformatics. 2020 Dec 22;36(20):5037-5044. doi: 10.1093/bioinformatics/btaa619.
Statistical and machine-learning analyses of tumor transcriptomic profiles offer a powerful resource to gain deeper understanding of tumor subtypes and disease prognosis. Currently, prognostic gene-expression signatures do not exist for all cancer types, and most developed to date have been optimized for individual tumor types. In Galgo, we implement a bi-objective optimization approach that prioritizes gene signature cohesiveness and patient survival in parallel, which provides greater power to identify tumor transcriptomic phenotypes strongly associated with patient survival.
To compare the predictive power of the signatures obtained by Galgo with previously studied subtyping methods, we used a meta-analytic approach testing a total of 35 large population-based transcriptomic biobanks of four different cancer types. Galgo-generated colorectal and lung adenocarcinoma signatures were stronger predictors of patient survival compared to published molecular classification schemes. One Galgo-generated breast cancer signature outperformed PAM50, AIMS, SCMGENE and IntClust subtyping predictors. In high-grade serous ovarian cancer, Galgo signatures obtained similar predictive power to a consensus classification method. In all cases, Galgo subtypes reflected enrichment of gene sets related to the hallmarks of the disease, which highlights the biological relevance of the partitions found.
The open-source R package is available on www.github.com/harpomaxx/galgo.
Supplementary data are available at Bioinformatics online.
对肿瘤转录组谱进行统计和机器学习分析为深入了解肿瘤亚型和疾病预后提供了强大的资源。目前,并非所有癌症类型都存在预后基因表达特征,而且迄今为止大多数特征都是针对个别肿瘤类型进行优化的。在 Galgo 中,我们实现了一种双目标优化方法,该方法并行优先考虑基因特征的一致性和患者生存率,从而更有能力识别与患者生存率强烈相关的肿瘤转录组表型。
为了比较 Galgo 获得的特征签名与以前研究的分型方法的预测能力,我们使用了一种元分析方法,总共测试了四个不同癌症类型的 35 个大型基于人群的转录组生物库。与已发表的分子分类方案相比,Galgo 生成的结直肠癌和肺腺癌特征签名是患者生存率的更强预测因子。Galgo 生成的一个乳腺癌特征签名优于 PAM50、AIMS、SCMGENE 和 IntClust 分型预测因子。在高级别浆液性卵巢癌中,Galgo 特征签名获得了与共识分类方法相似的预测能力。在所有情况下,Galgo 亚型反映了与疾病标志相关的基因集的富集,这突出了发现的分区的生物学相关性。
开源 R 包可在 www.github.com/harpomaxx/galgo 上获得。
补充数据可在生物信息学在线获得。