Department of Pharmaceutical Sciences, Center for Biomolecular Sciences, College of Pharmacy, University of Illinois at Chicago, 833 S. Wood St., Chicago, IL 60612, USA.
Institute of Marine Biochemistry, Vietnam Academy of Science and Technology, Nghiado, Caugiay, Hanoi 10000, Vietnam.
Molecules. 2022 Mar 22;27(7):2038. doi: 10.3390/molecules27072038.
Libraries of microorganisms have served as a cornerstone of therapeutic drug discovery, though the continued re-isolation of known natural product chemical entities has remained a significant obstacle to discovery efforts. A major contributing factor to this redundancy is the duplication of bacterial taxa in a library, which can be mitigated through the use of a variety of DNA sequencing strategies and/or mass spectrometry-informed bioinformatics platforms so that the library is created with minimal phylogenetic, and thus minimal natural product overlap. IDBac is a MALDI-TOF mass spectrometry-based bioinformatics platform used to assess overlap within collections of environmental bacterial isolates. It allows environmental isolate redundancy to be reduced while considering both phylogeny and natural product production. However, manually selecting isolates for addition to a library during this process was time intensive and left to the researcher's discretion. Here, we developed an algorithm that automates the prioritization of hundreds to thousands of environmental microorganisms in IDBac. The algorithm performs iterative reduction of natural product mass feature overlap within groups of isolates that share high homology of protein mass features. Employing this automation serves to minimize human bias and greatly increase efficiency in the microbial strain prioritization process.
微生物文库一直是治疗药物发现的基石,但已知天然产物化学实体的持续重新分离仍然是发现工作的一个重大障碍。造成这种冗余的一个主要因素是文库中细菌分类群的重复,这可以通过使用各种 DNA 测序策略和/或基于质谱的生物信息学平台来减轻,从而使文库在最小的系统发育和最小的天然产物重叠的情况下创建。IDBac 是一种基于 MALDI-TOF 质谱的生物信息学平台,用于评估环境细菌分离物集合内的重叠。它允许在考虑系统发育和天然产物产生的同时,减少环境分离物的冗余。然而,在这个过程中手动选择要添加到文库中的分离物既费时又费力,完全取决于研究人员的判断。在这里,我们开发了一种算法,可以自动对 IDBac 中的数百到数千种环境微生物进行优先级排序。该算法对具有高度蛋白质质量特征同源性的分离物组内的天然产物质量特征重叠进行迭代减少。采用这种自动化可以最大限度地减少人为偏见,并大大提高微生物菌株优先级排序过程的效率。