Zhao Tingting, Low Brian, Shen Qiming, Wang Yukai, Hidalgo Delgado David, Chau K N Minh, Pang Zhiqiang, Li Xiaoxiao, Xia Jianguo, Li Xing-Fang, Huan Tao
Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, British Columbia V6T 1Z1, Canada.
Division of Analytical and Environmental Toxicology, Department of Laboratory Medicine and Pathology, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Alberta T6G 2G3, Canada.
Anal Chem. 2025 Jun 3;97(21):11099-11109. doi: 10.1021/acs.analchem.5c00503. Epub 2025 May 22.
Over 70% of organic halogens, representing chlorine- and bromine-containing disinfection byproducts (Cl-/Br-DBPs), remain unidentified after 50 years of research. This work introduces a streamlined and cloud-based exposomics workflow that integrates high-resolution mass spectrometry (HRMS) analysis, multistage machine learning, and cloud computing for efficient analysis and characterization of Cl-/Br-DBPs. In particular, the multistage machine learning structure employs progressively different heavy isotopic peaks at each layer and capture the distinct isotopic characteristics of nonhalogenated compounds and Cl-/Br-compounds at different halogenation levels. This innovative approach enables the recognition of 22 types of Cl-/Br-compounds with up to 6 Br and 8 Cl atoms. To address the data imbalance among different classes, particularly the limited number of heavily chlorinated and brominated compounds, data perturbation is performed to generate hypothetical/synthetic molecular formulas containing multiple Cl and Br atoms, facilitating data augmentation. To further benefit the environmental chemistry community with limited computational experience and hardware access, above innovations are incorporated into HalogenFinder (http://www.halogenfinder.com/), a user-friendly, web-based platform for Cl-/Br-compound characterization, with statistical analysis support via MetaboAnalyst. In the benchmarking, HalogenFinder outperformed two established tools, achieving a higher recognition rate for 277 authentic Cl-/Br-compounds and uniquely identifying the number of Cl/Br atoms. In laboratory tests of DBP mixtures, it identified 72 Cl-/Br-DBPs with proposed structures, of which eight were confirmed with chemical standards. A retrospective analysis of 2022 finished water HRMS data revealed insightful temporal trends in Cl-DBP features. These results demonstrate HalogenFinder's effectiveness in advancing Cl-/Br-compound identification for environmental science and exposomics.
经过50年的研究,超过70%的有机卤素(代表含氯和含溴消毒副产物,即Cl-/Br-DBPs)仍未被识别。这项工作引入了一种简化的基于云的暴露组学工作流程,该流程集成了高分辨率质谱(HRMS)分析、多阶段机器学习和云计算,以高效分析和表征Cl-/Br-DBPs。特别是,多阶段机器学习结构在每一层采用逐渐不同的重同位素峰,并捕捉不同卤化水平下非卤代化合物和Cl-/Br-化合物的独特同位素特征。这种创新方法能够识别出多达6个溴原子和8个氯原子的22种Cl-/Br-化合物。为了解决不同类别之间的数据不平衡问题,特别是重度氯化和溴化化合物数量有限的问题,进行数据扰动以生成包含多个氯和溴原子的假设/合成分子式,促进数据增强。为了让计算经验有限且硬件访问受限的环境化学领域的研究人员也能从中受益,上述创新被整合到HalogenFinder(http://www.halogenfinder.com/)中,这是一个用户友好的基于网络的Cl-/Br-化合物表征平台,并通过MetaboAnalyst提供统计分析支持。在基准测试中,HalogenFinder的表现优于两个已有的工具,对277种真实的Cl-/Br-化合物实现了更高的识别率,并唯一确定了氯/溴原子的数量。在DBP混合物的实验室测试中,它识别出了72种具有提议结构的Cl-/Br-DBPs,其中8种通过化学标准得到了确认。对2022年成品水HRMS数据的回顾性分析揭示了Cl-DBP特征中富有洞察力的时间趋势。这些结果证明了HalogenFinder在推进环境科学和暴露组学中Cl-/Br-化合物识别方面的有效性。