Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands.
PLoS Comput Biol. 2019 Feb 6;15(2):e1006657. doi: 10.1371/journal.pcbi.1006657. eCollection 2019 Feb.
Robustly predicting outcome for cancer patients from gene expression is an important challenge on the road to better personalized treatment. Network-based outcome predictors (NOPs), which considers the cellular wiring diagram in the classification, hold much promise to improve performance, stability and interpretability of identified marker genes. Problematically, reports on the efficacy of NOPs are conflicting and for instance suggest that utilizing random networks performs on par to networks that describe biologically relevant interactions. In this paper we turn the prediction problem around: instead of using a given biological network in the NOP, we aim to identify the network of genes that truly improves outcome prediction. To this end, we propose SyNet, a gene network constructed ab initio from synergistic gene pairs derived from survival-labelled gene expression data. To obtain SyNet, we evaluate synergy for all 69 million pairwise combinations of genes resulting in a network that is specific to the dataset and phenotype under study and can be used to in a NOP model. We evaluated SyNet and 11 other networks on a compendium dataset of >4000 survival-labelled breast cancer samples. For this purpose, we used cross-study validation which more closely emulates real world application of these outcome predictors. We find that SyNet is the only network that truly improves performance, stability and interpretability in several existing NOPs. We show that SyNet overlaps significantly with existing gene networks, and can be confidently predicted (~85% AUC) from graph-topological descriptions of these networks, in particular the breast tissue-specific network. Due to its data-driven nature, SyNet is not biased to well-studied genes and thus facilitates post-hoc interpretation. We find that SyNet is highly enriched for known breast cancer genes and genes related to e.g. histological grade and tamoxifen resistance, suggestive of a role in determining breast cancer outcome.
从基因表达中稳健地预测癌症患者的结果是实现更好的个性化治疗道路上的一个重要挑战。基于网络的结果预测器(NOP)考虑了分类中的细胞连接图,有望提高所识别标记基因的性能、稳定性和可解释性。但问题在于,关于 NOP 功效的报告存在冲突,例如表明利用随机网络的性能与描述生物学相关相互作用的网络相当。在本文中,我们将预测问题颠倒过来:我们不是在 NOP 中使用给定的生物网络,而是旨在确定真正改善结果预测的基因网络。为此,我们提出了 SyNet,这是一种从基于生存标记的基因表达数据中得出的协同基因对构建的基因网络。为了获得 SyNet,我们评估了所有 6900 万个基因对之间的协同作用,从而得到了一个特定于数据集和研究表型的网络,可用于 NOP 模型。我们在一个包含 >4000 个生存标记乳腺癌样本的综合数据集上评估了 SyNet 和其他 11 个网络。为此,我们使用了跨研究验证,更紧密地模拟了这些结果预测器在现实世界中的应用。我们发现,SyNet 是唯一真正提高了几个现有 NOP 的性能、稳定性和可解释性的网络。我们表明,SyNet 与现有的基因网络有很大的重叠,并且可以从这些网络的图拓扑描述(特别是乳腺组织特异性网络)中进行有信心的预测(~85% AUC)。由于其数据驱动的性质,SyNet 不受研究充分的基因的影响,因此便于事后解释。我们发现,SyNet 高度富集了已知的乳腺癌基因和与例如组织学分级和他莫昔芬耐药性相关的基因,提示其在决定乳腺癌结果中起作用。