National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.
Bioinformatics. 2016 Jun 1;32(11):1724-32. doi: 10.1093/bioinformatics/btw059. Epub 2016 Feb 1.
The underlying relationship between genomic factors and the response of diverse cancer drugs still remains unclear. A number of studies showed that the heterogeneous responses to anticancer treatments of patients were partly associated with their specific changes in gene expression and somatic alterations. The emerging large-scale pharmacogenomic data provide us valuable opportunities to improve existing therapies or to guide early-phase clinical trials of compounds under development. However, how to identify the underlying combinatorial patterns among pharmacogenomics data are still a challenging issue.
In this study, we adopted a sparse network-regularized partial least square (SNPLS) method to identify joint modular patterns using large-scale pairwise gene-expression and drug-response data. We incorporated a molecular network to the (sparse) partial least square model to improve the module accuracy via a network-based penalty. We first demonstrated the effectiveness of SNPLS using a set of simulation data and compared it with two typical methods. Further, we applied it to gene expression profiles for 13 321 genes and pharmacological profiles for 98 anticancer drugs across 641 cancer cell lines consisting of diverse types of human cancers. We identified 20 gene-drug co-modules, each of which consists of 30 cell lines, 137 genes and 2 drugs on average. The majority of identified co-modules have significantly functional implications and coordinated gene-drug associations. The modular analysis here provided us new insights into the molecular mechanisms of how drugs act and suggested new drug targets for therapy of certain types of cancers.
A matlab package of SNPLS is available at http://page.amss.ac.cn/shihua.zhang/
Supplementary data are available at Bioinformatics online.
基因组因素与多种癌症药物反应之间的潜在关系仍不清楚。许多研究表明,患者对抗癌治疗的异质性反应部分与他们特定的基因表达变化和体细胞改变有关。新兴的大规模药物基因组学数据为我们提供了宝贵的机会,可以改进现有的治疗方法,或指导正在开发的化合物的早期临床试验。然而,如何识别药物基因组学数据中的潜在组合模式仍然是一个具有挑战性的问题。
在这项研究中,我们采用稀疏网络正则化偏最小二乘(SNPLS)方法,使用大规模的成对基因表达和药物反应数据来识别联合模块化模式。我们将分子网络纳入(稀疏)偏最小二乘模型中,通过基于网络的惩罚来提高模块的准确性。我们首先使用一组模拟数据证明了 SNPLS 的有效性,并将其与两种典型方法进行了比较。此外,我们将其应用于 641 个人类癌症细胞系的 13321 个基因的基因表达谱和 98 种抗癌药物的药理学谱。我们鉴定了 20 个基因-药物共模块,每个模块由 30 个细胞系、137 个基因和 2 种药物组成,平均每种药物有 2 种药物。大多数鉴定的共模块具有显著的功能意义和协调的基因-药物关联。这里的模块分析为我们提供了关于药物如何作用的分子机制的新见解,并为某些类型癌症的治疗提出了新的药物靶点。
SNPLS 的 matlab 包可在 http://page.amss.ac.cn/shihua.zhang/ 获得。
补充数据可在生物信息学在线获得。