Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America.
Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark.
PLoS Comput Biol. 2021 Feb 2;17(2):e1008647. doi: 10.1371/journal.pcbi.1008647. eCollection 2021 Feb.
The availability of bacterial transcriptomes has dramatically increased in recent years. This data deluge could result in detailed inference of underlying regulatory networks, but the diversity of experimental platforms and protocols introduces critical biases that could hinder scalable analysis of existing data. Here, we show that the underlying structure of the E. coli transcriptome, as determined by Independent Component Analysis (ICA), is conserved across multiple independent datasets, including both RNA-seq and microarray datasets. We subsequently combined five transcriptomics datasets into a large compendium containing over 800 expression profiles and discovered that its underlying ICA-based structure was still comparable to that of the individual datasets. With this understanding, we expanded our analysis to over 3,000 E. coli expression profiles and predicted three high-impact regulons that respond to oxidative stress, anaerobiosis, and antibiotic treatment. ICA thus enables deep analysis of disparate data to uncover new insights that were not visible in the individual datasets.
近年来,细菌转录组的可用性显著增加。这一数据洪流可能会导致对潜在调控网络的详细推断,但实验平台和协议的多样性引入了关键的偏差,这可能会阻碍对现有数据的可扩展分析。在这里,我们表明,通过独立成分分析(ICA)确定的大肠杆菌转录组的基础结构在多个独立数据集之间是保守的,包括 RNA-seq 和微阵列数据集。随后,我们将五个转录组数据集合并到一个大型纲要中,其中包含超过 800 个表达谱,并发现其基于 ICA 的基础结构仍然与单个数据集相当。有了这种理解,我们将分析扩展到超过 3000 个大肠杆菌表达谱,并预测了三个对氧化应激、厌氧和抗生素处理有反应的高影响调控子。因此,ICA 能够深入分析不同的数据,揭示单个数据集无法看到的新见解。