Institute for Risk Assessment Sciences (IRAS), Utrecht University, Utrecht, the Netherlands; Department Environment & Health, Vrije Universiteit, Amsterdam, the Netherlands.
Department Environment & Health, Vrije Universiteit, Amsterdam, the Netherlands.
Environ Int. 2021 Jul;152:106511. doi: 10.1016/j.envint.2021.106511. Epub 2021 Mar 24.
Chemicals of Emerging Concern (CECs) include a very wide group of chemicals that are suspected to be responsible for adverse effects on health, but for which very limited information is available. Chromatographic techniques coupled with high-resolution mass spectrometry (HRMS) can be used for non-targeted screening and detection of CECs, by using comprehensive annotation databases. Establishing a database focused on the annotation of CECs in human samples will provide new insight into the distribution and extent of exposures to a wide range of CECs in humans.
This study describes an approach for the aggregation and curation of an annotation database (CECscreen) for the identification of CECs in human biological samples.
The approach consists of three main parts. First, CECs compound lists from various sources were aggregated and duplications and inorganic compounds were removed. Subsequently, the list was curated by standardization of structures to create "MS-ready" and "QSAR-ready" SMILES, as well as calculation of exact masses (monoisotopic and adducts) and molecular formulas. The second step included the simulation of Phase I metabolites. The third and final step included the calculation of QSAR predictions related to physicochemical properties, environmental fate, toxicity and Absorption, Distribution, Metabolism, Excretion (ADME) processes and the retrieval of information from the US EPA CompTox Chemicals Dashboard.
All CECscreen database and property files are publicly available (DOI: https://doi.org/10.5281/zenodo.3956586). In total, 145,284 entries were aggregated from various CECs data sources. After elimination of duplicates and curation, the pipeline produced 70,397 unique "MS-ready" structures and 66,071 unique QSAR-ready structures, corresponding with 69,526 CAS numbers. Simulation of Phase I metabolites resulted in 306,279 unique metabolites. QSAR predictions could be performed for 64,684 of the QSAR-ready structures, whereas information was retrieved from the CompTox Chemicals Dashboard for 59,739 CAS numbers out of 69,526 inquiries. CECscreen is incorporated in the in silico fragmentation approach MetFrag.
The CECscreen database can be used to prioritize annotation of CECs measured in non-targeted HRMS, facilitating the large-scale detection of CECs in human samples for exposome research. Large-scale detection of CECs can be further improved by integrating the present database with resources that contain CECs (metabolites) and meta-data measurements, further expansion towards in silico and experimental (e.g., MassBank) generation of MS/MS spectra, and development of bioinformatics approaches capable of using correlation patterns in the measured chemical features.
新兴关注化学品(CECs)包括一大类被怀疑对健康有不良影响的化学品,但对这些化学品的信息知之甚少。色谱技术与高分辨率质谱(HRMS)相结合,通过使用全面的注释数据库,可以进行非靶向筛选和检测 CECs。建立一个专注于人类样本中 CECs 注释的数据库,将为广泛的 CECs 在人类中的分布和暴露程度提供新的见解。
本研究描述了一种用于聚集和管理用于鉴定人类生物样本中 CECs 的注释数据库(CECscreen)的方法。
该方法由三个主要部分组成。首先,从各种来源聚集 CECs 化合物列表,并去除重复项和无机化合物。随后,通过标准化结构来进行列表的整理,创建“MS 就绪”和“QSAR 就绪”SMILES,以及计算精确质量(单同位素和加合物)和分子式。第二步包括模拟 I 相代谢物。第三也是最后一步包括计算与理化性质、环境归宿、毒性和吸收、分布、代谢、排泄(ADME)过程相关的 QSAR 预测,并从美国 EPA CompTox 化学物质数据盘中检索信息。
CECscreen 数据库和属性文件全部公开可用(DOI:https://doi.org/10.5281/zenodo.3956586)。总共从各种 CECs 数据源中聚集了 145284 个条目。在消除重复项和整理后,该管道生成了 70397 个独特的“MS 就绪”结构和 66071 个独特的 QSAR 就绪结构,对应 69526 个 CAS 编号。I 相代谢物的模拟产生了 306279 个独特的代谢物。可以对 64684 个 QSAR 就绪结构进行 QSAR 预测,而从 69526 次查询中,从 CompTox 化学物质数据盘中检索到了 59739 个 CAS 编号的信息。CECscreen 被纳入了基于计算的碎片分析方法 MetFrag。
CECscreen 数据库可用于优先注释非靶向 HRMS 中测量的 CECs,从而促进大规模检测人类样本中的 CECs,用于暴露组研究。通过将当前数据库与包含 CECs(代谢物)和元数据测量的资源集成,可以进一步提高 CECs 的大规模检测,进一步扩展到基于计算和实验(例如 MassBank)生成 MS/MS 谱,以及开发能够利用测量化学特征中的相关模式的生物信息学方法。