Park Christopher Y, Krishnan Arjun, Zhu Qian, Wong Aaron K, Lee Young-Suk, Troyanskaya Olga G
Department of Computer Science, Princeton University, Princeton, NJ 08544, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA and Simons Center for Data Analysis, Simons Foundation, New York, NY, 10010, USA Department of Computer Science, Princeton University, Princeton, NJ 08544, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA and Simons Center for Data Analysis, Simons Foundation, New York, NY, 10010, USA.
Department of Computer Science, Princeton University, Princeton, NJ 08544, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA and Simons Center for Data Analysis, Simons Foundation, New York, NY, 10010, USA.
Bioinformatics. 2015 Apr 1;31(7):1093-101. doi: 10.1093/bioinformatics/btu786. Epub 2014 Nov 26.
Leveraging the large compendium of genomic data to predict biomedical pathways and specific mechanisms of protein interactions genome-wide in metazoan organisms has been challenging. In contrast to unicellular organisms, biological and technical variation originating from diverse tissues and cell-lineages is often the largest source of variation in metazoan data compendia. Therefore, a new computational strategy accounting for the tissue heterogeneity in the functional genomic data is needed to accurately translate the vast amount of human genomic data into specific interaction-level hypotheses.
We developed an integrated, scalable strategy for inferring multiple human gene interaction types that takes advantage of data from diverse tissue and cell-lineage origins. Our approach specifically predicts both the presence of a functional association and also the most likely interaction type among human genes or its protein products on a whole-genome scale. We demonstrate that directly incorporating tissue contextual information improves the accuracy of our predictions, and further, that such genome-wide results can be used to significantly refine regulatory interactions from primary experimental datasets (e.g. ChIP-Seq, mass spectrometry).
An interactive website hosting all of our interaction predictions is publically available at http://pathwaynet.princeton.edu. Software was implemented using the open-source Sleipnir library, which is available for download at https://bitbucket.org/libsleipnir/libsleipnir.bitbucket.org.
Supplementary data are available at Bioinformatics online.
利用大量基因组数据来预测后生动物全基因组范围内的生物医学途径和蛋白质相互作用的特定机制具有挑战性。与单细胞生物不同,来自不同组织和细胞谱系的生物学和技术变异通常是后生动物数据汇编中最大的变异来源。因此,需要一种新的计算策略来考虑功能基因组数据中的组织异质性,以便将大量人类基因组数据准确地转化为特定的相互作用水平假设。
我们开发了一种集成的、可扩展的策略,用于推断多种人类基因相互作用类型,该策略利用了来自不同组织和细胞谱系来源的数据。我们的方法专门预测人类基因或其蛋白质产物之间功能关联的存在以及全基因组范围内最可能的相互作用类型。我们证明,直接纳入组织背景信息可提高我们预测的准确性,而且,这种全基因组结果可用于显著完善来自主要实验数据集(如ChIP-Seq、质谱)的调控相互作用。
一个托管我们所有相互作用预测的交互式网站可在http://pathwaynet.princeton.edu上公开获取。软件使用开源的Sleipnir库实现,可在https://bitbucket.org/libsleipnir/libsleipnir.bitbucket.org下载。
补充数据可在《生物信息学》在线获取。