Huttenhower Curtis, Haley Erin M, Hibbs Matthew A, Dumeaux Vanessa, Barrett Daniel R, Coller Hilary A, Troyanskaya Olga G
Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA.
Genome Res. 2009 Jun;19(6):1093-106. doi: 10.1101/gr.082214.108. Epub 2009 Feb 26.
Human genomic data of many types are readily available, but the complexity and scale of human molecular biology make it difficult to integrate this body of data, understand it from a systems level, and apply it to the study of specific pathways or genetic disorders. An investigator could best explore a particular protein, pathway, or disease if given a functional map summarizing the data and interactions most relevant to his or her area of interest. Using a regularized Bayesian integration system, we provide maps of functional activity and interaction networks in over 200 areas of human cellular biology, each including information from approximately 30,000 genome-scale experiments pertaining to approximately 25,000 human genes. Key to these analyses is the ability to efficiently summarize this large data collection from a variety of biologically informative perspectives: prediction of protein function and functional modules, cross-talk among biological processes, and association of novel genes and pathways with known genetic disorders. In addition to providing maps of each of these areas, we also identify biological processes active in each data set. Experimental investigation of five specific genes, AP3B1, ATP6AP1, BLOC1S1, LAMP2, and RAB11A, has confirmed novel roles for these proteins in the proper initiation of macroautophagy in amino acid-starved human fibroblasts. Our functional maps can be explored using HEFalMp (Human Experimental/Functional Mapper), a web interface allowing interactive visualization and investigation of this large body of information.
多种类型的人类基因组数据很容易获取,但人类分子生物学的复杂性和规模使得整合这些数据、从系统层面理解它们并将其应用于特定途径或遗传疾病的研究变得困难。如果能得到一张功能图谱,总结与研究者感兴趣领域最相关的数据和相互作用,那么研究者就能最好地探索特定的蛋白质、途径或疾病。我们使用一个正则化贝叶斯整合系统,提供了人类细胞生物学200多个领域的功能活性图谱和相互作用网络,每个图谱都包含来自约30000个与约25000个人类基因相关的基因组规模实验的信息。这些分析的关键在于能够从各种生物学信息角度有效地总结这个庞大的数据集:蛋白质功能和功能模块的预测、生物过程之间的相互作用,以及新基因和途径与已知遗传疾病的关联。除了提供这些领域中每个领域的图谱外,我们还确定了每个数据集中活跃的生物过程。对五个特定基因AP3B1、ATP6AP1、BLOC1S1、LAMP2和RAB11A的实验研究证实了这些蛋白质在氨基酸饥饿的人类成纤维细胞中自噬正确起始过程中的新作用。我们的功能图谱可以通过HEFalMp(人类实验/功能映射器)进行探索,这是一个网络界面,允许对这些大量信息进行交互式可视化和研究。