Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, 240 Longwood Avenue, Boston, Massachusetts 02115, United States.
J Chem Inf Model. 2021 Jun 28;61(6):2560-2571. doi: 10.1021/acs.jcim.0c01304. Epub 2021 May 27.
Research in natural products, the genetically encoded small molecules produced by organisms in an idiosyncratic fashion, deals with molecular structure, biosynthesis, and biological activity. Bioinformatics analyses of microbial genomes can successfully reveal the genetic instructions, biosynthetic gene clusters, that produce many natural products. Genes to molecule predictions made on biosynthetic gene clusters have revealed many important new structures. There is no comparable method for genes to biological activity predictions. To address this missing pathway, we developed a machine learning bioinformatics method for predicting a natural product's antibiotic activity directly from the sequence of its biosynthetic gene cluster. We trained commonly used machine learning classifiers to predict antibacterial or antifungal activity based on features of known natural product biosynthetic gene clusters. We have identified classifiers that can attain accuracies as high as 80% and that have enabled the identification of biosynthetic enzymes and their corresponding molecular features that are associated with antibiotic activity.
天然产物研究,即生物体以特殊方式产生的遗传编码小分子,涉及分子结构、生物合成和生物活性。对微生物基因组的生物信息学分析可以成功揭示产生许多天然产物的遗传指令和生物合成基因簇。基于生物合成基因簇进行的基因到分子预测已经揭示了许多重要的新结构。目前还没有可比的方法可以从基因预测到生物活性。为了解决这个缺失的途径,我们开发了一种机器学习生物信息学方法,可直接从其生物合成基因簇的序列预测天然产物的抗生素活性。我们训练了常用的机器学习分类器,根据已知天然产物生物合成基因簇的特征来预测抗菌或抗真菌活性。我们已经确定了可以达到高达 80%准确率的分类器,并且能够鉴定出与抗生素活性相关的生物合成酶及其相应的分子特征。