Cao Mingshu, Fraser Karl, Huege Jan, Featonby Tom, Rasmussen Susanne, Jones Chris
AgResearch Grasslands Research Centre, Palmerston North, 4442 New Zealand.
Massey University, Institute of Agriculture and Environment, Palmerston North, New Zealand.
Metabolomics. 2015;11(3):696-706. doi: 10.1007/s11306-014-0727-x. Epub 2014 Sep 7.
Liquid chromatography coupled to mass spectrometry (LCMS) is widely used in metabolomics due to its sensitivity, reproducibility, speed and versatility. Metabolites are detected as peaks which are characterised by mass-over-charge ratio () and retention time (rt), and one of the most critical but also the most challenging tasks in metabolomics is to annotate the large number of peaks detected in biological samples. Accurate measurements enable the prediction of molecular formulae which provide clues to the chemical identity of peaks, but often a number of metabolites have identical molecular formulae. Chromatographic behaviour, reflecting the physicochemical properties of metabolites, should also provide structural information. However, the variation in rt between analytical runs, and the complicating factors underlying the observed time shifts, make the use of such information for peak annotation a non-trivial task. To this end, we conducted Quantitative Structure-Retention Relationship (QSRR) modelling between the calculated molecular descriptors (MDs) and the experimental retention times (rts) of 93 authentic compounds analysed using hydrophilic interaction liquid chromatography (HILIC) coupled to high resolution MS. A predictive QSRR model based on Random Forests algorithm outperformed a Multiple Linear Regression based model, and achieved a high correlation between predicted rts and experimental rts (Pearson's correlation coefficient = 0.97), with mean and median absolute error of 0.52 min and 0.34 min (corresponding to 5.1 and 3.2 % error), respectively. We demonstrate that rt prediction with the precision achieved enables the systematic utilisation of rts for annotating unknown peaks detected in a metabolomics study. The application of the QSRR model with the strategy we outlined enhanced the peak annotation process by reducing the number of false positives resulting from database queries by matching accurate mass alone, and enriching the reference library. The predicted rts were validated using either authentic compounds or ion fragmentation patterns.
液相色谱-质谱联用(LCMS)因其灵敏度、重现性、速度和通用性而在代谢组学中被广泛应用。代谢物被检测为峰,这些峰通过质荷比()和保留时间(rt)来表征,而代谢组学中最关键但也最具挑战性的任务之一是注释生物样品中检测到的大量峰。精确的测量能够预测分子式,从而为峰的化学身份提供线索,但通常许多代谢物具有相同的分子式。反映代谢物物理化学性质的色谱行为也应提供结构信息。然而,分析运行之间rt的变化以及观察到的时间偏移背后的复杂因素,使得利用此类信息进行峰注释成为一项并非易事的任务。为此,我们对93种使用亲水作用液相色谱(HILIC)与高分辨率质谱联用分析的真实化合物的计算分子描述符(MDs)和实验保留时间(rts)之间进行了定量结构-保留关系(QSRR)建模。基于随机森林算法的预测性QSRR模型优于基于多元线性回归的模型,并且在预测rts和实验rts之间实现了高度相关性(皮尔逊相关系数 = 0.97),平均绝对误差和中位数绝对误差分别为0.52分钟和0.34分钟(分别对应5.1%和3.2%的误差)。我们证明,以所达到的精度进行rt预测能够系统地利用rts来注释代谢组学研究中检测到的未知峰。我们概述的策略应用QSRR模型,通过减少仅匹配精确质量的数据库查询产生的假阳性数量并丰富参考库,增强了峰注释过程。使用真实化合物或离子碎片模式对预测的rts进行了验证。