一种整合高通量“组学”数据集以识别潜在机制联系的计算框架。

A computational framework to integrate high-throughput '-omics' datasets for the identification of potential mechanistic links.

机构信息

The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.

Experimental and Clinical Research Centre, a joint center of Max Delbrück Centre for Molecular Medicine & Charité University Hospital, Berlin, Germany.

出版信息

Nat Protoc. 2018 Dec;13(12):2781-2800. doi: 10.1038/s41596-018-0064-z.

Abstract

We recently presented a three-pronged association study that integrated human intestinal microbiome data derived from shotgun-based sequencing with untargeted serum metabolome data and measures of host physiology. Metabolome and microbiome data are high dimensional, posing a major challenge for data integration. Here, we present a step-by-step computational protocol that details and discusses the dimensionality-reduction techniques used and methods for subsequent integration and interpretation of such heterogeneous types of data. Dimensionality reduction was achieved through a combination of data normalization approaches, binning of co-abundant genes and metabolites, and integration of prior biological knowledge. The use of prior knowledge to overcome functional redundancy across microbiome species is one central advance of our method over available alternative approaches. Applying this framework, other investigators can integrate various '-omics' readouts with variables of host physiology or any other phenotype of interest (e.g., connecting host and microbiome readouts to disease severity or treatment outcome in a clinical cohort) in a three-pronged association analysis to identify potential mechanistic links to be tested in experimental settings. Although we originally developed the framework for a human metabolome-microbiome study, it is generalizable to other organisms and environmental metagenomes, as well as to studies including other -omics domains such as transcriptomics and proteomics. The provided R code runs in ~1 h on a standard PC.

摘要

我们最近提出了一项三管齐下的关联研究,该研究整合了基于高通量测序的人类肠道微生物组数据、非靶向性血清代谢组数据和宿主生理学测量值。代谢组和微生物组数据具有高维性,这对数据集成提出了重大挑战。在这里,我们提出了一个逐步的计算方案,详细介绍并讨论了所使用的降维技术以及随后对这种异质数据类型进行集成和解释的方法。通过数据归一化方法、共丰度基因和代谢物的分组以及先验生物学知识的整合,实现了降维。利用先验知识来克服微生物组物种之间的功能冗余是我们的方法相对于现有替代方法的一个主要优势。应用该框架,其他研究人员可以在三管齐下的关联分析中整合各种“组学”读数与宿主生理学或任何其他感兴趣的表型(例如,将宿主和微生物组读数与临床队列中的疾病严重程度或治疗结果联系起来),以确定在实验环境中进行测试的潜在机制联系。尽管我们最初为人类代谢组-微生物组研究开发了该框架,但它可推广到其他生物体和环境宏基因组,以及包括转录组学和蛋白质组学等其他“组学”领域的研究。提供的 R 代码在标准 PC 上大约需要 1 小时即可运行。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索