Australian Centre for Research on Separation Science (ACROSS), School of Physical Sciences-Chemistry , University of Tasmania , Private Bag 75 , Hobart , 7001 Tasmania , Australia.
Pfizer Global Research and Development , Sandwich CT139NJ , U.K.
Anal Chem. 2018 Aug 7;90(15):9434-9440. doi: 10.1021/acs.analchem.8b02084. Epub 2018 Jul 10.
Structure identification in nontargeted metabolomics based on liquid-chromatography coupled to mass spectrometry (LC-MS) remains a significant challenge. Quantitative structure-retention relationship (QSRR) modeling is a technique capable of accelerating the structure identification of metabolites by predicting their retention, allowing false positives to be eliminated during the interpretation of metabolomics data. In this work, 191 compounds were grouped according to molecular weight and a QSRR study was carried out on the 34 resulting groups to eliminate false positives. Partial least squares (PLS) regression combined with a Genetic algorithm (GA) was applied to construct the linear QSRR models based on a variety of VolSurf+ molecular descriptors. A novel dual-filtering approach, which combines Tanimoto similarity (TS) searching as the primary filter and retention index (RI) similarity clustering as the secondary filter, was utilized to select compounds in training sets to derive the QSRR models yielding R of 0.8512 and an average root mean square error in prediction (RMSEP) of 8.45%. With a retention index filter expressed as ±2 standard deviations (SD) of the error, representative compounds were predicted with >91% accuracy, and for 53% of the groups (18/34), at least one false positive compound could be eliminated. The proposed strategy can thus narrow down the number of false positives to be assessed in nontargeted metabolomics.
基于液相色谱与质谱联用(LC-MS)的非靶向代谢组学中的结构鉴定仍然是一个重大挑战。定量构效关系(QSRR)建模是一种能够通过预测代谢物的保留时间来加速代谢物结构鉴定的技术,从而在代谢组学数据分析解释过程中排除假阳性。在这项工作中,根据分子量将 191 种化合物进行分组,并对 34 个分组进行 QSRR 研究,以消除假阳性。偏最小二乘(PLS)回归结合遗传算法(GA)被应用于构建基于多种 VolSurf+分子描述符的线性 QSRR 模型。一种新的双过滤方法,将相似度搜索(TS)作为主要过滤器和保留指数(RI)相似性聚类作为次要过滤器相结合,用于选择训练集中的化合物以获得 QSRR 模型,其 R 为 0.8512,平均预测均方根误差(RMSEP)为 8.45%。对于保留指数过滤器,表达为误差的±2 个标准差(SD),代表性化合物的预测准确率>91%,对于 53%的组(18/34),至少可以消除一个假阳性化合物。因此,该策略可以减少非靶向代谢组学中需要评估的假阳性数量。