Department of Bioengineering, University of California San Diego, La Jolla, California, USA.
Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark.
mSphere. 2022 Apr 27;7(2):e0003322. doi: 10.1128/msphere.00033-22. Epub 2022 Mar 21.
Mycobacterium tuberculosis is one of the most consequential human bacterial pathogens, posing a serious challenge to 21st century medicine. A key feature of its pathogenicity is its ability to adapt its transcriptional response to environmental stresses through its transcriptional regulatory network (TRN). While many studies have sought to characterize specific portions of the M. tuberculosis TRN, and some studies have performed system-level analysis, few have been able to provide a network-based model of the TRN that also provides the relative shifts in transcriptional regulator activity triggered by changing environments. Here, we compiled a compendium of nearly 650 publicly available, high quality M. tuberculosis RNA-sequencing data sets and applied an unsupervised machine learning method to obtain a quantitative, top-down TRN. It consists of 80 independently modulated gene sets known as "iModulons," 41 of which correspond to known regulons. These iModulons explain 61% of the variance in the organism's transcriptional response. We show that iModulons (i) reveal the function of poorly characterized regulons, (ii) describe the transcriptional shifts that occur during environmental changes such as shifting carbon sources, oxidative stress, and infection events, and (iii) identify intrinsic clusters of regulons that link several important metabolic systems, including lipid, cholesterol, and sulfur metabolism. This transcriptome-wide analysis of the M. tuberculosis TRN informs future research on effective ways to study and manipulate its transcriptional regulation and presents a knowledge-enhanced database of all published high-quality RNA-seq data for this organism to date. Mycobacterium tuberculosis H37Rv is one of the world's most impactful pathogens, and a large part of the success of the organism relies on the differential expression of its genes to adapt to its environment. The expression of the organism's genes is driven primarily by its transcriptional regulatory network, and most research on the TRN focuses on identifying and quantifying clusters of coregulated genes known as regulons. While previous studies have relied on molecular measurements, in the manuscript we utilized an alternative technique that performs machine learning to a large data set of transcriptomic data. This approach is less reliant on hypotheses about the role of specific regulatory systems and allows for the discovery of new biological findings for already collected data. A better understanding of the structure of the M. tuberculosis TRN will have important implications in the design of improved therapeutic approaches.
结核分枝杆菌是最重要的人类细菌性病原体之一,对 21 世纪的医学构成了严重挑战。其致病性的一个关键特征是,它能够通过其转录调控网络 (TRN) 适应环境压力下的转录反应。虽然许多研究试图描述结核分枝杆菌 TRN 的特定部分,并且一些研究进行了系统水平的分析,但很少有研究能够提供一个基于网络的 TRN 模型,该模型还可以提供环境变化触发的转录调控因子活性的相对变化。在这里,我们收集了近 650 个公开的、高质量的结核分枝杆菌 RNA 测序数据集,并应用无监督机器学习方法获得了一个定量的、自上而下的 TRN。它由 80 个独立调节的基因集组成,称为“iModulons”,其中 41 个对应于已知的调控子。这些 iModulons 解释了该生物体转录反应的 61%的方差。我们表明,iModulons(i)揭示了功能尚未被充分描述的调控子,(ii)描述了在环境变化(如碳源变化、氧化应激和感染事件)期间发生的转录变化,(iii)识别了将几个重要代谢系统(包括脂质、胆固醇和硫代谢)联系起来的内在调控子簇。对结核分枝杆菌 TRN 的全转录组分析为研究和操纵其转录调控提供了新的思路,并为该生物体迄今为止所有已发表的高质量 RNA-seq 数据提供了一个增强知识的数据库。结核分枝杆菌 H37Rv 是世界上最具影响力的病原体之一,其生物体的成功很大程度上依赖于其基因的差异表达以适应其环境。生物体基因的表达主要由其转录调控网络驱动,TRN 的大多数研究都集中在识别和量化已知的核心调控基因簇,即调控子。虽然之前的研究依赖于分子测量,但在本文中,我们利用了一种替代技术,即对转录组数据的大数据集进行机器学习。这种方法对特定调控系统的作用假设较少,并且允许对已经收集的数据发现新的生物学发现。更好地理解结核分枝杆菌 TRN 的结构将对设计改进的治疗方法具有重要意义。