• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种在保留指数数据库中查找潜在错误条目的通用程序。

A general procedure for finding potentially erroneous entries in the database of retention indices.

作者信息

Khrisanfov Mikhail D, Matyushin Dmitriy D, Samokhin Andrey S

机构信息

Chemistry Department, Lomonosov Moscow State University, Leninskie Gory 1-3, 119991, Moscow, Russia; A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071, Moscow, Russia.

A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, GSP-1, 119071, Moscow, Russia.

出版信息

Anal Chim Acta. 2024 Apr 8;1297:342375. doi: 10.1016/j.aca.2024.342375. Epub 2024 Feb 17.

DOI:10.1016/j.aca.2024.342375
PMID:38438243
Abstract

BACKGROUND

The NIST retention index database is one the most widely used sources of retention indices. In both untargeted analysis and machine learning studies filtering for potential errors is rather lacking or nonexistent. According to our estimates about 80% of the compounds from both NIST 17 and NIST 20 retention index databases have only one RI value per stationary phase, which makes searching for erroneous values with statistical methods impossible. Manual inspection is also impractical because the database contains more than 300 000 entries.

RESULTS

We suggest a two-step procedure to find potentially erroneous retention indices based on machine learning. The first step is to use five predictive models to obtain predicted retention index values for the whole database. The second one is to compare these predicted values against the experimental ones. We consider a retention index erroneous if its accuracy (the difference between predicted and experimental value) is in the bottom 5% for each of the five models simultaneously. Using this method, we were able to detect 2093 outlier entries for standard and semi-standard non-polar stationary phases in the NIST 17 retention index database, 566 of those were corrected or removed by the developers in the NIST 20.

SIGNIFICANCE

This is a novel approach to find potentially erroneous entries in a large-scale database with mostly unique entries, which can be applied not only to retention indices. The procedure can help filter and report mishandled data to improve the quality of the dataset for machine learning applications and experimental use.

摘要

背景

美国国家标准与技术研究院(NIST)保留指数数据库是使用最广泛的保留指数来源之一。在非靶向分析和机器学习研究中,对潜在错误的筛选相当缺乏或根本不存在。据我们估计,NIST 17和NIST 20保留指数数据库中约80%的化合物在每个固定相上只有一个保留指数值,这使得用统计方法寻找错误值变得不可能。人工检查也不切实际,因为该数据库包含超过30万个条目。

结果

我们提出了一种基于机器学习的两步程序来查找潜在错误的保留指数。第一步是使用五个预测模型为整个数据库获得预测的保留指数值。第二步是将这些预测值与实验值进行比较。如果某个保留指数的准确性(预测值与实验值之间的差异)在五个模型中的每一个中都同时处于底部5%,我们就认为该保留指数是错误的。使用这种方法,我们能够在NIST 17保留指数数据库中检测到2093个标准和半标准非极性固定相的异常条目,其中566个已被NIST 20的开发者修正或删除。

意义

这是一种在大多数条目唯一的大规模数据库中查找潜在错误条目的新方法,它不仅可以应用于保留指数。该程序有助于筛选和报告处理不当的数据,以提高用于机器学习应用和实验使用的数据集的质量。

相似文献

1
A general procedure for finding potentially erroneous entries in the database of retention indices.一种在保留指数数据库中查找潜在错误条目的通用程序。
Anal Chim Acta. 2024 Apr 8;1297:342375. doi: 10.1016/j.aca.2024.342375. Epub 2024 Feb 17.
2
Critical evaluation of the NIST retention index database reliability with specific examples.通过具体示例对美国国家标准与技术研究院保留指数数据库的可靠性进行批判性评估。
Anal Bioanal Chem. 2024 Nov;416(28):6181-6186. doi: 10.1007/s00216-024-05562-9. Epub 2024 Sep 27.
3
Validation of the identification reliability of known and assumed UDMH transformation products using gas chromatographic retention indices and machine learning.使用气相色谱保留指数和机器学习验证已知和假定的 UDMH 转化产物的鉴定可靠性。
Chemosphere. 2024 Aug;362:142679. doi: 10.1016/j.chemosphere.2024.142679. Epub 2024 Jun 21.
4
Large-scale statistical study of the dependence of retention index on heating rate in temperature-programmed gas chromatography.大规模统计研究程序升温气相色谱中保留指数与升温速率的依赖关系。
J Chromatogr A. 2024 Sep 13;1732:465223. doi: 10.1016/j.chroma.2024.465223. Epub 2024 Aug 2.
5
Accurate prediction of isothermal gas chromatographic Kováts retention indices.准确预测等温气相色谱柯瓦茨保留指数。
J Chromatogr A. 2023 Aug 30;1705:464176. doi: 10.1016/j.chroma.2023.464176. Epub 2023 Jun 24.
6
Predicting Kováts Retention Indices Using Graph Neural Networks.使用图神经网络预测科瓦茨保留指数。
J Chromatogr A. 2021 Jun 7;1646:462100. doi: 10.1016/j.chroma.2021.462100. Epub 2021 Mar 25.
7
Deep Learning Based Prediction of Gas Chromatographic Retention Indices for a Wide Variety of Polar and Mid-Polar Liquid Stationary Phases.基于深度学习的宽极性和中等极性液相固定相的气相色谱保留指数预测。
Int J Mol Sci. 2021 Aug 25;22(17):9194. doi: 10.3390/ijms22179194.
8
Development of a database of gas chromatographic retention properties of organic compounds.有机化合物气相色谱保留特性数据库的开发。
J Chromatogr A. 2007 Jul 20;1157(1-2):414-21. doi: 10.1016/j.chroma.2007.05.044. Epub 2007 May 18.
9
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
10
An accurate and easy procedure to obtain isothermal Kováts retention indices in gas chromatography.
J Sep Sci. 2006 Dec;29(18):2785-92. doi: 10.1002/jssc.200600110.

引用本文的文献

1
Critical evaluation of the NIST retention index database reliability with specific examples.通过具体示例对美国国家标准与技术研究院保留指数数据库的可靠性进行批判性评估。
Anal Bioanal Chem. 2024 Nov;416(28):6181-6186. doi: 10.1007/s00216-024-05562-9. Epub 2024 Sep 27.