Ong Edison, Szedlak Anthony, Kang Yunyi, Smith Peyton, Smith Nicholas, McBride Madison, Finlay Darren, Vuori Kristiina, Mason James, Ball Edward D, Piermarocchi Carlo, Paternostro Giovanni
1 Salgomed Inc. , Del Mar, California.
J Comput Biol. 2015 Apr;22(4):266-88. doi: 10.1089/cmb.2014.0297.
A key aim of systems biology is the reconstruction of molecular networks. We do not yet, however, have networks that integrate information from all datasets available for a particular clinical condition. This is in part due to the limited scalability, in terms of required computational time and power, of existing algorithms. Network reconstruction methods should also be scalable in the sense of allowing scientists from different backgrounds to efficiently integrate additional data. We present a network model of acute myeloid leukemia (AML). In the current version (AML 2.1), we have used gene expression data (both microarray and RNA-seq) from 5 different studies comprising a total of 771 AML samples and a protein-protein interactions dataset. Our scalable network reconstruction method is in part based on the well-known property of gene expression correlation among interacting molecules. The difficulty of distinguishing between direct and indirect interactions is addressed by optimizing the coefficient of variation of gene expression, using a validated gold-standard dataset of direct interactions. Computational time is much reduced compared to other network reconstruction methods. A key feature is the study of the reproducibility of interactions found in independent clinical datasets. An analysis of the most significant clusters, and of the network properties (intraset efficiency, degree, betweenness centrality, and PageRank) of common AML mutations demonstrated the biological significance of the network. A statistical analysis of the response of blast cells from 11 AML patients to a library of kinase inhibitors provided an experimental validation of the network. A combination of network and experimental data identified CDK1, CDK2, CDK4, and CDK6 and other kinases as potential therapeutic targets in AML.
系统生物学的一个关键目标是重建分子网络。然而,我们目前还没有能够整合特定临床状况下所有可用数据集信息的网络。部分原因在于现有算法在所需计算时间和能力方面的可扩展性有限。网络重建方法还应在允许不同背景的科学家有效整合额外数据的意义上具有可扩展性。我们提出了一种急性髓系白血病(AML)的网络模型。在当前版本(AML 2.1)中,我们使用了来自5项不同研究的基因表达数据(包括微阵列和RNA测序),这些研究总共包含771个AML样本以及一个蛋白质-蛋白质相互作用数据集。我们可扩展的网络重建方法部分基于相互作用分子之间基因表达相关性的著名特性。通过使用经过验证的直接相互作用的金标准数据集优化基因表达的变异系数,解决了区分直接和间接相互作用的难题。与其他网络重建方法相比,计算时间大幅减少。一个关键特征是研究在独立临床数据集中发现的相互作用的可重复性。对常见AML突变的最显著聚类以及网络特性(集内效率、度、介数中心性和PageRank)的分析证明了该网络的生物学意义。对11名AML患者的原始细胞对激酶抑制剂文库的反应进行的统计分析为该网络提供了实验验证。网络数据与实验数据相结合,确定了CDK1、CDK2、CDK4和CDK6以及其他激酶是AML潜在的治疗靶点。