Howard Hughes Medical Institute, UC Santa Cruz, CA, USA.
Bioinformatics. 2010 Jun 15;26(12):i237-45. doi: 10.1093/bioinformatics/btq182.
High-throughput data is providing a comprehensive view of the molecular changes in cancer tissues. New technologies allow for the simultaneous genome-wide assay of the state of genome copy number variation, gene expression, DNA methylation and epigenetics of tumor samples and cancer cell lines. Analyses of current data sets find that genetic alterations between patients can differ but often involve common pathways. It is therefore critical to identify relevant pathways involved in cancer progression and detect how they are altered in different patients.
We present a novel method for inferring patient-specific genetic activities incorporating curated pathway interactions among genes. A gene is modeled by a factor graph as a set of interconnected variables encoding the expression and known activity of a gene and its products, allowing the incorporation of many types of omic data as evidence. The method predicts the degree to which a pathway's activities (e.g. internal gene states, interactions or high-level 'outputs') are altered in the patient using probabilistic inference. Compared with a competing pathway activity inference approach called SPIA, our method identifies altered activities in cancer-related pathways with fewer false-positives in both a glioblastoma multiform (GBM) and a breast cancer dataset. PARADIGM identified consistent pathway-level activities for subsets of the GBM patients that are overlooked when genes are considered in isolation. Further, grouping GBM patients based on their significant pathway perturbations divides them into clinically-relevant subgroups having significantly different survival outcomes. These findings suggest that therapeutics might be chosen that target genes at critical points in the commonly perturbed pathway(s) of a group of patients.
Source code available at http://sbenz.github.com/Paradigm,.
Supplementary data are available at Bioinformatics online.
高通量数据为我们提供了癌症组织中分子变化的全面视图。新技术允许同时对肿瘤样本和癌细胞系的基因组拷贝数变异、基因表达、DNA 甲基化和表观遗传学进行全基因组分析。对当前数据集的分析发现,患者之间的遗传改变可能不同,但通常涉及共同途径。因此,确定涉及癌症进展的相关途径并检测它们在不同患者中的变化方式至关重要。
我们提出了一种新的方法,用于推断患者特异性遗传活性,方法是整合基因之间经过精心整理的途径相互作用。通过因子图对基因进行建模,作为一组相互连接的变量,这些变量编码基因及其产物的表达和已知活性,允许将许多类型的组学数据作为证据纳入其中。该方法使用概率推理来预测患者中途径活性(例如内部基因状态、相互作用或高级“输出”)改变的程度。与一种称为 SPIA 的竞争途径活性推断方法相比,我们的方法在胶质母细胞瘤多形性(GBM)和乳腺癌数据集两者中都具有更少的假阳性,从而识别出与癌症相关的途径中的改变活性。PARADIGM 确定了 GBM 患者亚组的一致的途径级活动,而当单独考虑基因时,这些活动往往会被忽略。此外,根据患者的显著途径扰动将 GBM 患者分组,将他们分为具有显著不同生存结果的临床相关亚组。这些发现表明,针对一组患者中共同扰动途径(s)的关键节点的治疗方法可能会被选择。
可在 http://sbenz.github.com/Paradigm 上获得源代码。
补充数据可在生物信息学在线获得。