Cambridge Research Institute, Cancer Research UK, Cambridge, United Kingdom.
PLoS Comput Biol. 2011 Oct;7(10):e1002227. doi: 10.1371/journal.pcbi.1002227. Epub 2011 Oct 20.
Different data types can offer complementary perspectives on the same biological phenomenon. In cancer studies, for example, data on copy number alterations indicate losses and amplifications of genomic regions in tumours, while transcriptomic data point to the impact of genomic and environmental events on the internal wiring of the cell. Fusing different data provides a more comprehensive model of the cancer cell than that offered by any single type. However, biological signals in different patients exhibit diverse degrees of concordance due to cancer heterogeneity and inherent noise in the measurements. This is a particularly important issue in cancer subtype discovery, where personalised strategies to guide therapy are of vital importance. We present a nonparametric Bayesian model for discovering prognostic cancer subtypes by integrating gene expression and copy number variation data. Our model is constructed from a hierarchy of Dirichlet Processes and addresses three key challenges in data fusion: (i) To separate concordant from discordant signals, (ii) to select informative features, (iii) to estimate the number of disease subtypes. Concordance of signals is assessed individually for each patient, giving us an additional level of insight into the underlying disease structure. We exemplify the power of our model in prostate cancer and breast cancer and show that it outperforms competing methods. In the prostate cancer data, we identify an entirely new subtype with extremely poor survival outcome and show how other analyses fail to detect it. In the breast cancer data, we find subtypes with superior prognostic value by using the concordant results. These discoveries were crucially dependent on our model's ability to distinguish concordant and discordant signals within each patient sample, and would otherwise have been missed. We therefore demonstrate the importance of taking a patient-specific approach, using highly-flexible nonparametric Bayesian methods.
不同的数据类型可以为同一生物现象提供互补的视角。例如,在癌症研究中,关于拷贝数改变的数据表明肿瘤中基因组区域的丢失和扩增,而转录组数据则表明基因组和环境事件对细胞内部布线的影响。融合不同的数据提供了比任何单一类型更全面的癌症细胞模型。然而,由于癌症异质性和测量中的固有噪声,不同患者的生物信号表现出不同程度的一致性。这在癌症亚型发现中是一个特别重要的问题,因为个性化的治疗策略对于指导治疗至关重要。我们提出了一种非参数贝叶斯模型,通过整合基因表达和拷贝数变异数据来发现预后癌症亚型。我们的模型是由一个狄利克雷过程层次结构构建的,解决了数据融合中的三个关键挑战:(i)分离一致和不一致的信号,(ii)选择信息丰富的特征,(iii)估计疾病亚型的数量。为每个患者单独评估信号的一致性,这使我们对潜在疾病结构有了额外的深入了解。我们在前列腺癌和乳腺癌中举例说明了我们模型的强大功能,并表明它优于竞争方法。在前列腺癌数据中,我们发现了一种具有极差生存结果的全新亚型,并展示了其他分析如何未能检测到它。在乳腺癌数据中,我们通过使用一致的结果找到了预后价值更高的亚型。这些发现都严重依赖于我们模型在每个患者样本中区分一致和不一致信号的能力,如果没有这种能力,这些发现将被忽略。因此,我们证明了采用特定于患者的方法,使用高度灵活的非参数贝叶斯方法的重要性。