Suppr超能文献

基于机器学习的保留时间预测在 LC-HRMS 中农药和农药转化产物可疑筛查中的评估与应用。

Evaluation and application of machine learning-based retention time prediction for suspect screening of pesticides and pesticide transformation products in LC-HRMS.

机构信息

Shanghai Municipal Center for Disease Control and Prevention, Shanghai, 200336, China; State Environmental Protection Key Laboratory of Environmental Health Impact Assessment of Emerging Contaminants, Shanghai, 200336, China.

Shanghai Changning Center for Disease Control and Prevention, Shanghai, 200051, China.

出版信息

Chemosphere. 2021 May;271:129447. doi: 10.1016/j.chemosphere.2020.129447. Epub 2020 Dec 27.

Abstract

Computational QSAR models have gradually been preferred for retention time prediction in data mining of emerging environmental contaminants using liquid chromatography coupled with mass spectrometry. Generally, the model performance relies on the components such as machine learning algorithms, chemical features, and example data. In this study, we evaluated the performances of four algorithms on three feature sets, using 321 and 77 pesticides as the training and validation sets, respectively. The results were varied with different combinations of algorithms on distinct feature sets. Two strategies including enhancing the complexity of chemical features and enlarging the size of the training set were proved to improve the results. XGBoost, Random Forest, and lightGBM algorithms exhibited the best results when built on a large-scale chemical descriptors, while the Keras algorithm preferred fingerprints. These four models have comparable prediction accuracies that at least 90% of pesticides in validation set can be successfully predicted with ΔRT <1.0 min. Meanwhile, a blended prediction strategy using average results from four models presented a better result than any single model. This strategy was used for assisting identification of pesticides and pesticide transformation products in 120 strawberry samples from a national survey of food contamination. Twenty pesticides and twelve pesticide transformation products were tentatively identified, where all pesticides and two pesticide transformation products (bifenazate diazene and spirotetramat-enol) were confirmed by standard materials. The outcome of this study suggested that retention time prediction is a valuable approach in compound identification when integrated with in silico MS spectra and other MS identification strategies.

摘要

计算定量构效关系模型在使用液相色谱-质谱联用技术对新兴环境污染物的数据挖掘中逐渐受到青睐,用于预测保留时间。通常,模型性能依赖于机器学习算法、化学特征和示例数据等成分。在这项研究中,我们使用 321 种和 77 种农药分别作为训练集和验证集,评估了四种算法在三个特征集上的性能。结果因不同算法和不同特征集的组合而有所不同。两种策略,包括增强化学特征的复杂性和扩大训练集的大小,被证明可以提高结果。当基于大规模化学描述符构建时,XGBoost、随机森林和轻量级梯度提升机算法表现出最佳结果,而 Keras 算法则偏好指纹。这四个模型的预测准确性相当,至少有 90%的验证集中的农药可以成功预测,ΔRT<1.0min。同时,使用四个模型的平均结果进行混合预测策略的结果优于任何单个模型。该策略用于协助鉴定 120 个草莓样本中的农药和农药转化产物,这些样本来自全国食品污染调查。共鉴定出 20 种农药和 12 种农药转化产物,其中所有农药和两种农药转化产物(双苯氟脲二氮和螺虫乙酯-烯醇)均通过标准物质得到确认。这项研究的结果表明,保留时间预测是一种有价值的化合物鉴定方法,当与计算机 MS 谱和其他 MS 鉴定策略相结合时。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验