Department of Molecular and Cellular Biology, University of California, Davis, CA 95616, USA.
Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA.
J Magn Reson. 2022 Sep;342:107268. doi: 10.1016/j.jmr.2022.107268. Epub 2022 Jul 16.
NMR is a valuable experimental tool in the structural biologist's toolkit to elucidate the structures, functions, and motions of biomolecules. The progress of machine learning, particularly in structural biology, reveals the critical importance of large, diverse, and reliable datasets in developing new methods and understanding in structural biology and science more broadly. Biomolecular NMR research groups produce large amounts of data, and there is renewed interest in organizing these data to train new, sophisticated machine learning architectures and to improve biomolecular NMR analysis pipelines. The foundational data type in NMR is the free-induction decay (FID). There are opportunities to build sophisticated machine learning methods to tackle long-standing problems in NMR data processing, resonance assignment, dynamics analysis, and structure determination using NMR FIDs. Our goal in this study is to provide a lightweight, broadly available tool for archiving FID data as it is generated at the spectrometer, and grow a new resource of FID data and associated metadata. This study presents a relational schema for storing and organizing the metadata items that describe an NMR sample and FID data, which we call Spectral Database (SpecDB). SpecDB is implemented in SQLite and includes a Python software library providing a command-line application to create, organize, query, backup, share, and maintain the database. This set of software tools and database schema allow users to store, organize, share, and learn from NMR time domain data. SpecDB is freely available under an open source license at https://github.rpi.edu/RPIBioinformatics/SpecDB.
NMR 是结构生物学家工具包中非常有价值的实验工具,可用于阐明生物分子的结构、功能和运动。机器学习的进步,特别是在结构生物学领域,揭示了在开发新方法和更广泛地理解结构生物学和科学方面,拥有大量、多样化和可靠数据集的重要性。生物分子 NMR 研究小组产生了大量的数据,人们重新产生了对组织这些数据的兴趣,以训练新的、复杂的机器学习架构,并改进生物分子 NMR 分析管道。NMR 中的基础数据类型是自由感应衰减(FID)。有机会构建复杂的机器学习方法来解决 NMR 数据处理、共振分配、动力学分析和使用 NMR FID 确定结构等方面的长期问题。我们在这项研究中的目标是提供一种轻量级、广泛可用的工具,用于在光谱仪生成时归档 FID 数据,并生成新的 FID 数据和相关元数据资源。本研究提出了一种用于存储和组织描述 NMR 样品和 FID 数据的元数据项的关系模式,我们称之为光谱数据库(SpecDB)。SpecDB 是在 SQLite 中实现的,包括一个 Python 软件库,提供了一个命令行应用程序,用于创建、组织、查询、备份、共享和维护数据库。这套软件工具和数据库模式允许用户存储、组织、共享和从 NMR 时域数据中学习。SpecDB 可在开源许可证下免费获得,网址为 https://github.rpi.edu/RPIBioinformatics/SpecDB。