Flanagan Aaron R, Glavin Frank G
School of Computer Science, University of Galway, Galway City, Co. Galway, H91 FYH2, Ireland.
Sci Data. 2025 Mar 24;12(1):498. doi: 10.1038/s41597-025-04848-6.
Raman spectroscopy is utilised extensively in pharmaceutical analysis for tasks such as drug discovery, quality control and active pharmaceutical ingredient (API) development. Despite this, access to open-source Raman spectral datasets for modelling and analysis is often a challenge. In laboratory settings, small spectral libraries are typically compiled for one-shot identification of intermediates or unknown chemicals, which restricts availability to comprehensive and high-quality reference data. In this work, we introduce a new open-source Raman dataset consisting of pure chemical compounds commonly employed in the development of APIs. By curating and publishing this dataset, we aim to provide the scientific community with access to high-quality, reusable data. Containing 3,510 samples spanning 32 compounds, this data can be utilised for referencing and can potentially facilitate in the development of more accurate and generalisable calibration models when access to reference data is limited.
拉曼光谱在药物分析中被广泛应用于药物发现、质量控制和活性药物成分(API)开发等任务。尽管如此,获取用于建模和分析的开源拉曼光谱数据集往往是一项挑战。在实验室环境中,通常会编制小型光谱库用于一次性鉴定中间体或未知化学物质,这限制了获取全面且高质量参考数据的机会。在这项工作中,我们引入了一个新的开源拉曼数据集,该数据集由API开发中常用的纯化合物组成。通过整理和发布这个数据集,我们旨在为科学界提供高质量、可重复使用的数据。该数据包含32种化合物的3510个样本,在获取参考数据有限时,可用于参考,并有可能促进更准确、更通用的校准模型的开发。