Oak Ridge Institute for Science and Education (ORISE) Research Participation Program, United States Environmental Protection Agency, 109 T.W. Alexander Dr., Research Triangle Park, Durham, NC, 27711, USA.
National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, 109 T.W. Alexander Dr., Research Triangle Park, Durham, NC, 27711, USA.
Sci Data. 2019 Aug 2;6(1):141. doi: 10.1038/s41597-019-0145-z.
Confident identification of unknown chemicals in high resolution mass spectrometry (HRMS) screening studies requires cohesive workflows and complementary data, tools, and software. Chemistry databases, screening libraries, and chemical metadata have become fixtures in identification workflows. To increase confidence in compound identifications, the use of structural fragmentation data collected via tandem mass spectrometry (MS/MS or MS) is vital. However, the availability of empirically collected MS/MS data for identification of unknowns is limited. Researchers have therefore turned to in silico generation of MS/MS data for use in HRMS-based screening studies. This paper describes the generation en masse of predicted MS/MS spectra for the entirety of the US EPA's DSSTox database using competitive fragmentation modelling and a freely available open source tool, CFM-ID. The generated dataset comprises predicted MS/MS spectra for ~700,000 structures, and mappings between predicted spectra, structures, associated substances, and chemical metadata. Together, these resources facilitate improved compound identifications in HRMS screening studies. These data are accessible via an SQL database, a comma-separated export file (.csv), and EPA's CompTox Chemicals Dashboard.
在高分辨率质谱(HRMS)筛选研究中,要自信地识别未知化学物质,需要有连贯的工作流程和互补的数据、工具和软件。化学数据库、筛选库和化学元数据已成为鉴定工作流程中的固定内容。为了提高化合物鉴定的可信度,通过串联质谱(MS/MS 或 MS)收集的结构碎片化数据的使用至关重要。然而,用于鉴定未知物的经验性采集的 MS/MS 数据的可用性有限。因此,研究人员转向使用基于 HRMS 的筛选研究中使用的计算生成的 MS/MS 数据。本文描述了使用竞争碎片化建模和免费提供的开源工具 CFM-ID 大规模生成整个美国环保署 DSSTox 数据库的预测 MS/MS 光谱。生成的数据集包含约 70 万个结构的预测 MS/MS 光谱,以及预测光谱、结构、相关物质和化学元数据之间的映射。这些资源共同促进了 HRMS 筛选研究中化合物鉴定的改进。这些数据可通过 SQL 数据库、逗号分隔的导出文件(.csv)和美国环保署的 CompTox Chemicals Dashboard 访问。