Hadjithomas Michalis, Chen I-Min Amy, Chu Ken, Ratner Anna, Palaniappan Krishna, Szeto Ernest, Huang Jinghua, Reddy T B K, Cimermančič Peter, Fischbach Michael A, Ivanova Natalia N, Markowitz Victor M, Kyrpides Nikos C, Pati Amrita
Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, California, USA.
Biosciences Computing, Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA.
mBio. 2015 Jul 14;6(4):e00932. doi: 10.1128/mBio.00932-15.
In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of "big" genomic data for discovering small molecules. IMG-ABC relies on IMG's comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC's focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in Alphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules.
IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG's extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to expand, with the goal of becoming an essential component of any bioinformatic exploration of the secondary metabolism world.
在次生代谢产物的发现过程中,序列数据分析是一条很有前景的探索途径,但由于缺乏能大规模实现这种系统方法的计算平台,该途径在很大程度上仍未得到充分利用。在这项工作中,我们展示了IMG-ABC(https://img.jgi.doe.gov/abc),这是集成微生物基因组(IMG)系统内的一个生物合成基因簇图谱,旨在利用“大”基因组数据的力量来发现小分子。IMG-ABC依靠IMG全面的综合结构和功能基因组数据来分析生物合成基因簇(BCs)及相关次生代谢产物(SMs)。SMs和BCs是IMG-ABC中的两类主要对象,每类都有丰富的属性集合。IMG-ABC的一个独特特征是将基因组以及宏基因组中经过实验验证和计算预测的BCs都纳入其中,从而识别未培养群体和稀有分类单元中的BCs。我们通过首次在α-变形菌纲中发现产吩嗪簇,展示了IMG-ABC专注的综合分析工具在全球范围内探索微生物次生代谢方面的优势。IMG-ABC努力填补长期存在的次生代谢领域计算探索资源空白;其底层的可扩展框架能够遍历未被探索的系统发育和化学结构空间,成为发现新分子新时代的一扇大门。
IMG-ABC是最大的公开可用的预测和实验生物合成基因簇及其产生的次生代谢产物的数据库。该系统还包括与IMG广泛的基因组/宏基因组数据及分析工具包集成的强大搜索和分析工具。随着关于生物合成基因簇和次生代谢产物的新研究不断发表以及更多基因组被测序,IMG-ABC将持续扩展,目标是成为次生代谢领域任何生物信息学探索的重要组成部分。