Suppr超能文献

准确预测等温气相色谱柯瓦茨保留指数。

Accurate prediction of isothermal gas chromatographic Kováts retention indices.

机构信息

Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E9, Canada.

Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E8, Canada.

出版信息

J Chromatogr A. 2023 Aug 30;1705:464176. doi: 10.1016/j.chroma.2023.464176. Epub 2023 Jun 24.

Abstract

We describe a freely available web server called Retention Index Predictor (RIpred) (https://ripred.ca) that rapidly and accurately predicts Gas Chromatographic Kováts Retention Indices (RI) using SMILES strings as chemical structure input. RIpred performs RI prediction for three different stationary phases (semi-standard non-polar (SSNP), standard non-polar (SNP), and standard polar (SP)) for both derivatized (trimethylsilyl (TMS) and tert‑butyldimethylsilyl (TBDMS) derivatized) and underivatized (base compound) forms of GC-amenable structures. RIpred was developed to address the need for freely available, fast, highly accurate RI predictions for a wide range of derivatized and underivatized chemicals for all common GC stationary phases. RIpred was trained using a Graph Neural Network (GNN) that used compound structures, their extracted features (mostly atom-level features) and the GC-RI data from the National Institute of Standards and Technology databases (NIST 17 and NIST 20). We curated this NIST 17 and NIST 20 GC-RI data, which is available for all three stationary phases, to create appropriate inputs (molecular graphs in this case) needed to enhance our model performance. The performance of different RIpred predictive models was evaluated using 10-fold cross validation (CV). The best performing RIpred models were identified and when tested on hold-out test sets from all stationary phases, achieved a Mean Absolute Error (MAE) of <73 RI units (SSNP: 16.5-29.5, SNP: 38.5-45.9, SP: 46.52-72.53). The Mean Absolute Percentage Error (MAPE) of these models were typically within 3% (SSNP: 0.78-1.62%, SNP: 1.87-2.88%, SP: 2.34-4.05%). When compared to the best performing model by Qu et al., 2021, RIpred performed similarly (MAE of 16.57 RI units [RIpred] vs. 16.84 RI units [Qu et al., 2021 predictor] for derivatized compounds). RIpred also includes ∼5 million predicted RI values for all GC-amenable compounds (∼57,000) in the Human Metabolome Database HMDB 5.0 (Wishart et al., 2022).

摘要

我们描述了一个名为 Retention Index Predictor(RIpred)的免费网络服务器(https://ripred.ca),它可以使用 SMILES 字符串作为化学结构输入,快速准确地预测气相色谱科瓦茨保留指数(RI)。RIpred 可以为三种不同的固定相(半标准非极性(SSNP)、标准非极性(SNP)和标准极性(SP))预测衍生化(三甲基硅基(TMS)和叔丁基二甲基硅基(TBDMS)衍生化)和未衍生化(基本化合物)形式的 GC 可处理结构的 RI。RIpred 的开发是为了满足对各种衍生化和未衍生化化学物质的广泛需求,这些化学物质适用于所有常见的 GC 固定相,需要快速、高度准确的 RI 预测。RIpred 是使用图神经网络(GNN)训练的,该网络使用化合物结构、提取的特征(主要是原子级特征)和国家标准与技术研究所(NIST)数据库中的 GC-RI 数据(NIST 17 和 NIST 20)。我们整理了这个 NIST 17 和 NIST 20 GC-RI 数据,这些数据可用于所有三种固定相,以创建增强模型性能所需的适当输入(在这种情况下是分子图)。使用 10 折交叉验证(CV)评估不同 RIpred 预测模型的性能。确定性能最佳的 RIpred 模型,并在所有固定相的保留测试集上进行测试时,实现了<73 RI 单位的平均绝对误差(MAE)(SSNP:16.5-29.5,SNP:38.5-45.9,SP:46.52-72.53)。这些模型的平均绝对百分比误差(MAPE)通常在 3%以内(SSNP:0.78-1.62%,SNP:1.87-2.88%,SP:2.34-4.05%)。与 Qu 等人 2021 年表现最佳的模型相比,RIpred 的表现类似(衍生化合物的 MAE 为 16.57 RI 单位[RIpred]与 16.84 RI 单位[Qu 等人,2021 年预测器])。RIpred 还包含约 500 万个适用于所有 GC 的化合物的预测 RI 值(约 57,000 个),这些化合物包含在人类代谢组数据库 HMDB 5.0(Wishart 等人,2022 年)中。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验