Department of Chemistry, Simon Fraser University, Burnaby, CA, USA.
Nat Prod Rep. 2021 Jan 1;38(1):264-278. doi: 10.1039/d0np00053a. Epub 2020 Aug 28.
Covering: 2010-2020The digital revolution is driving significant changes in how people store, distribute, and use information. With the advent of new technologies around linked data, machine learning and large-scale network inference, the natural products research field is beginning to embrace real-time sharing and large-scale analysis of digitized experimental data. Databases play a key role in this, as they allow systematic annotation and storage of data for both basic and advanced applications. The quality of the content, structure, and accessibility of these databases all contribute to their usefulness for the scientific community in practice. This review covers the development of databases relevant for microbial natural product discovery during the past decade (2010-2020), including repositories of chemical structures/properties, metabolomics, and genomic data (biosynthetic gene clusters). It provides an overview of the most important databases and their functionalities, highlights some early meta-analyses using such databases, and discusses basic principles to enable widespread interoperability between databases. Furthermore, it points out conceptual and practical challenges in the curation and usage of natural products databases. Finally, the review closes with a discussion of key action points required for the field moving forward, not only for database developers but for any scientist active in the field.
2010-2020 年
数字革命正在推动人们存储、分发和使用信息的方式发生重大变化。随着围绕链接数据、机器学习和大规模网络推理的新技术的出现,天然产物研究领域开始接受数字化实验数据的实时共享和大规模分析。数据库在这方面发挥着关键作用,因为它们允许对数据进行系统注释和存储,适用于基础和高级应用。这些数据库的内容质量、结构和可访问性都有助于提高它们在实践中对科学界的有用性。
本篇综述涵盖了过去十年(2010-2020 年)与微生物天然产物发现相关的数据库的发展情况,包括化学结构/性质、代谢组学和基因组数据(生物合成基因簇)的存储库。它概述了最重要的数据库及其功能,强调了一些早期使用这些数据库的元分析,并讨论了实现数据库之间广泛互操作性的基本原则。此外,它还指出了天然产物数据库的管理和使用方面存在的概念和实际挑战。最后,本文讨论了该领域向前发展所需的关键要点,不仅适用于数据库开发人员,也适用于该领域的任何科学家。