Bioinformatics Division, Beijing National Research Institute for Information Science and Technology (BNRIST), Department of Automation, Tsinghua University, Beijing, 100084, People's Republic of China.
Bioscience Department, COMSATS Institute of Information Technology, Islamabad, 44000, Pakistan.
BMC Genomics. 2021 Jan 19;22(1):60. doi: 10.1186/s12864-020-07357-5.
Efficient regulation of bacterial genes in response to the environmental stimulus results in unique gene clusters known as operons. Lack of complete operonic reference and functional information makes the prediction of metagenomic operons a challenging task; thus, opening new perspectives on the interpretation of the host-microbe interactions.
In this work, we identified whole-genome and metagenomic operons via MetaRon (Metagenome and whole-genome opeRon prediction pipeline). MetaRon identifies operons without any experimental or functional information. MetaRon was implemented on datasets with different levels of complexity and information. Starting from its application on whole-genome to simulated mixture of three whole-genomes (E. coli MG1655, Mycobacterium tuberculosis H37Rv and Bacillus subtilis str. 16), E. coli c20 draft genome extracted from chicken gut and finally on 145 whole-metagenome data samples from human gut. MetaRon consistently achieved high operon prediction sensitivity, specificity and accuracy across E. coli whole-genome (97.8, 94.1 and 92.4%), simulated genome (93.7, 75.5 and 88.1%) and E. coli c20 (87, 91 and 88%,), respectively. Finally, we identified 1,232,407 unique operons from 145 paired-end human gut metagenome samples. We also report strong association of type 2 diabetes with Maltose phosphorylase (K00691), 3-deoxy-D-glycero-D-galacto-nononate 9-phosphate synthase (K21279) and an uncharacterized protein (K07101).
With MetaRon, we were able to remove two notable limitations of existing whole-genome operon prediction methods: (1) generalizability (ability to predict operons in unrelated bacterial genomes), and (2) whole-genome and metagenomic data management. We also demonstrate the use of operons as a subset to represent the trends of secondary metabolites in whole-metagenome data and the role of secondary metabolites in the occurrence of disease condition. Using operonic data from metagenome to study secondary metabolic trends will significantly reduce the data volume to more precise data. Furthermore, the identification of metabolic pathways associated with the occurrence of type 2 diabetes (T2D) also presents another dimension of analyzing the human gut metagenome. Presumably, this study is the first organized effort to predict metagenomic operons and perform a detailed analysis in association with a disease, in this case type 2 diabetes. The application of MetaRon to metagenomic data at diverse scale will be beneficial to understand the gene regulation and therapeutic metagenomics.
细菌基因在应对环境刺激时的高效调控导致了独特的基因簇的产生,这些基因簇被称为操纵子。由于缺乏完整的操纵子参考和功能信息,使得预测宏基因组操纵子成为一项具有挑战性的任务,这为宿主-微生物相互作用的解释开辟了新的视角。
在这项工作中,我们通过 MetaRon(宏基因组和全基因组操纵子预测管道)来识别全基因组和宏基因组操纵子。MetaRon 可以在没有任何实验或功能信息的情况下识别操纵子。MetaRon 已应用于具有不同复杂程度和信息量的数据集。从全基因组的应用开始,到三种全基因组(大肠杆菌 MG1655、结核分枝杆菌 H37Rv 和枯草芽孢杆菌 str. 16)的模拟混合物,再到从鸡肠道中提取的大肠杆菌 c20 草图基因组,最后到 145 个人肠道的全宏基因组数据样本。MetaRon 在大肠杆菌全基因组(97.8%、94.1%和 92.4%)、模拟基因组(93.7%、75.5%和 88.1%)和大肠杆菌 c20(87%、91%和 88%)中均能实现高操纵子预测灵敏度、特异性和准确性。最后,我们从 145 个配对末端人类肠道宏基因组样本中鉴定出 1232407 个独特的操纵子。我们还报告了 2 型糖尿病与麦芽糖磷酸化酶(K00691)、3-脱氧-D-甘油-D-半乳糖-壬酮-9-磷酸合酶(K21279)和一个未鉴定的蛋白质(K07101)之间存在强烈的关联。
通过使用 MetaRon,我们克服了现有全基因组操纵子预测方法的两个显著局限性:(1)通用性(在不相关的细菌基因组中预测操纵子的能力),以及(2)全基因组和宏基因组数据管理。我们还展示了将操纵子用作代表整个宏基因组数据中次生代谢物趋势的子集,以及次生代谢物在疾病发生中的作用。使用宏基因组数据中的操纵子来研究次生代谢趋势将大大减少数据量,得到更精确的数据。此外,鉴定与 2 型糖尿病(T2D)发生相关的代谢途径也为分析人类肠道宏基因组提供了另一个维度。可以推测,这项研究是首次有组织地预测宏基因组操纵子并与疾病(在这种情况下为 2 型糖尿病)进行详细分析。MetaRon 在不同规模的宏基因组数据中的应用将有助于理解基因调控和治疗性宏基因组学。