School of Computing Science, University of Glasgow, Glasgow, United Kingdom.
Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.
PLoS Comput Biol. 2021 May 4;17(5):e1008920. doi: 10.1371/journal.pcbi.1008920. eCollection 2021 May.
Specialised metabolites from microbial sources are well-known for their wide range of biomedical applications, particularly as antibiotics. When mining paired genomic and metabolomic data sets for novel specialised metabolites, establishing links between Biosynthetic Gene Clusters (BGCs) and metabolites represents a promising way of finding such novel chemistry. However, due to the lack of detailed biosynthetic knowledge for the majority of predicted BGCs, and the large number of possible combinations, this is not a simple task. This problem is becoming ever more pressing with the increased availability of paired omics data sets. Current tools are not effective at identifying valid links automatically, and manual verification is a considerable bottleneck in natural product research. We demonstrate that using multiple link-scoring functions together makes it easier to prioritise true links relative to others. Based on standardising a commonly used score, we introduce a new, more effective score, and introduce a novel score using an Input-Output Kernel Regression approach. Finally, we present NPLinker, a software framework to link genomic and metabolomic data. Results are verified using publicly available data sets that include validated links.
微生物来源的特殊代谢产物以其广泛的生物医学应用而闻名,特别是作为抗生素。在挖掘配对的基因组和代谢组数据集以寻找新的特殊代谢产物时,在生物合成基因簇 (BGC) 和代谢产物之间建立联系是一种很有前途的方法。然而,由于大多数预测 BGC 的详细生物合成知识缺乏,以及可能的组合数量众多,这并不是一项简单的任务。随着配对组学数据集的可用性增加,这个问题变得越来越紧迫。目前的工具不能有效地自动识别有效链接,而手动验证是天然产物研究的一个相当大的瓶颈。我们证明,使用多个链接评分函数一起,可以更容易地将真实链接相对于其他链接进行优先级排序。基于对常用分数的标准化,我们引入了一个新的、更有效的分数,并引入了一个使用输入-输出核回归方法的新分数。最后,我们提出了 NPLinker,这是一个用于链接基因组和代谢组数据的软件框架。使用包含验证链接的公开数据集来验证结果。