Suppr超能文献

一种用于预测分析新出现关注污染物最合适仪器方法的多标签分类器。

A Multi-Label Classifier for Predicting the Most Appropriate Instrumental Method for the Analysis of Contaminants of Emerging Concern.

作者信息

Alygizakis Nikiforos, Konstantakos Vasileios, Bouziotopoulos Grigoris, Kormentzas Evangelos, Slobodnik Jaroslav, Thomaidis Nikolaos S

机构信息

Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Panepistimiopolis Zografou, 15771 Athens, Greece.

Environmental Institute, Okružná 784/42, 97241 Kos, Slovakia.

出版信息

Metabolites. 2022 Feb 23;12(3):199. doi: 10.3390/metabo12030199.

Abstract

Liquid chromatography-high resolution mass spectrometry (LC-HRMS) and gas chromatography-high resolution mass spectrometry (GC-HRMS) have revolutionized analytical chemistry among many other disciplines. These advanced instrumentations allow to theoretically capture the whole chemical universe that is contained in samples, giving unimaginable opportunities to the scientific community. Laboratories equipped with these instruments produce a lot of data daily that can be digitally archived. Digital storage of data opens up the opportunity for retrospective suspect screening investigations for the occurrence of chemicals in the stored chromatograms. The first step of this approach involves the prediction of which data is more appropriate to be searched. In this study, we built an optimized multi-label classifier for predicting the most appropriate instrumental method (LC-HRMS or GC-HRMS or both) for the analysis of chemicals in digital specimens. The approach involved the generation of a baseline model based on the knowledge that an expert would use and the generation of an optimized machine learning model. A multi-step feature selection approach, a model selection strategy, and optimization of the classifier's hyperparameters led to a model with accuracy that outperformed the baseline implementation. The models were used to predict the most appropriate instrumental technique for new substances. The scripts are available at GitHub and the dataset at Zenodo.

摘要

液相色谱-高分辨率质谱联用仪(LC-HRMS)和气相色谱-高分辨率质谱联用仪(GC-HRMS)在众多学科中彻底改变了分析化学。这些先进的仪器理论上能够捕捉样品中包含的整个化学物质世界,为科学界带来了难以想象的机遇。配备这些仪器的实验室每天都会产生大量数据,这些数据可以进行数字存档。数据的数字存储为对存储的色谱图中化学物质的出现进行回顾性可疑物筛查调查提供了机会。这种方法的第一步涉及预测哪些数据更适合进行搜索。在本研究中,我们构建了一个优化的多标签分类器,用于预测分析数字样本中化学物质的最合适仪器方法(LC-HRMS或GC-HRMS或两者)。该方法包括基于专家会使用的知识生成一个基线模型以及生成一个优化的机器学习模型。一种多步骤特征选择方法、一种模型选择策略以及对分类器超参数的优化,得到了一个准确率高于基线实现的模型。这些模型被用于预测新物质最合适的仪器技术。脚本可在GitHub上获取,数据集可在Zenodo上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8890/8949148/2514c0024787/metabolites-12-00199-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验