Nikolova Olga, Moser Russell, Kemp Christopher, Gönen Mehmet, Margolin Adam A
Computational Biology Program, Oregon Health and Science University, Portland, OR 97239, USA.
Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
Bioinformatics. 2017 May 1;33(9):1362-1369. doi: 10.1093/bioinformatics/btw836.
In recent years, vast advances in biomedical technologies and comprehensive sequencing have revealed the genomic landscape of common forms of human cancer in unprecedented detail. The broad heterogeneity of the disease calls for rapid development of personalized therapies. Translating the readily available genomic data into useful knowledge that can be applied in the clinic remains a challenge. Computational methods are needed to aid these efforts by robustly analyzing genome-scale data from distinct experimental platforms for prioritization of targets and treatments.
We propose a novel, biologically motivated, Bayesian multitask approach, which explicitly models gene-centric dependencies across multiple and distinct genomic platforms. We introduce a gene-wise prior and present a fully Bayesian formulation of a group factor analysis model. In supervised prediction applications, our multitask approach leverages similarities in response profiles of groups of drugs that are more likely to be related to true biological signal, which leads to more robust performance and improved generalization ability. We evaluate the performance of our method on molecularly characterized collections of cell lines profiled against two compound panels, namely the Cancer Cell Line Encyclopedia and the Cancer Therapeutics Response Portal. We demonstrate that accounting for the gene-centric dependencies enables leveraging information from multi-omic input data and improves prediction and feature selection performance. We further demonstrate the applicability of our method in an unsupervised dimensionality reduction application by inferring genes essential to tumorigenesis in the pancreatic ductal adenocarcinoma and lung adenocarcinoma patient cohorts from The Cancer Genome Atlas.
: The code for this work is available at https://github.com/olganikolova/gbgfa.
: nikolova@ohsu.edu or margolin@ohsu.edu.
Supplementary data are available at Bioinformatics online.
近年来,生物医学技术的巨大进步和全面测序以前所未有的细节揭示了人类常见癌症形式的基因组格局。该疾病广泛的异质性要求快速开发个性化疗法。将现成的基因组数据转化为可应用于临床的有用知识仍然是一项挑战。需要计算方法来通过对来自不同实验平台的基因组规模数据进行稳健分析,以确定靶点和治疗的优先级,从而辅助这些工作。
我们提出了一种新颖的、具有生物学动机的贝叶斯多任务方法,该方法明确地对多个不同基因组平台上以基因为中心的依赖性进行建模。我们引入了基因层面的先验,并给出了组因子分析模型的完全贝叶斯公式。在监督预测应用中,我们的多任务方法利用了更可能与真实生物信号相关的药物组反应谱中的相似性,这导致了更稳健的性能和更好的泛化能力。我们在针对两个化合物面板(即癌细胞系百科全书和癌症治疗反应门户)进行分子特征分析的细胞系集合上评估了我们方法的性能。我们证明,考虑以基因为中心的依赖性能够利用来自多组学输入数据的信息,并提高预测和特征选择性能。我们通过从癌症基因组图谱推断胰腺导管腺癌和肺腺癌患者队列中肿瘤发生所必需的基因,进一步证明了我们方法在无监督降维应用中的适用性。
这项工作的代码可在https://github.com/olganikolova/gbgfa获取。
nikolova@ohsu.edu或margolin@ohsu.edu。
补充数据可在《生物信息学》在线获取。