Tyteca Eva, Talebi Mohammad, Amos Ruth, Park Soo Hyun, Taraji Maryam, Wen Yabin, Szucs Roman, Pohl Christopher A, Dolan John W, Haddad Paul R
Australian Centre for Research on Separation Science (ACROSS), School of Physical Sciences-Chemistry, University of Tasmania, Private Bag 75, Hobart 7001, Australia; Department of Chemical Engineering (CHIS), Vrije Universiteit Brussel, Pleinlaan 2, Brussels, Belgium.
Australian Centre for Research on Separation Science (ACROSS), School of Physical Sciences-Chemistry, University of Tasmania, Private Bag 75, Hobart 7001, Australia.
J Chromatogr A. 2017 Feb 24;1486:50-58. doi: 10.1016/j.chroma.2016.09.062. Epub 2016 Sep 27.
Quantitative Structure-Retention Relationships (QSRR) have the potential to speed up the screening phase of chromatographic method development as the initial exploratory experiments are replaced by prediction of analyte retention based solely on the structure of the molecule. The present study offers further proof-of-concept of localized QSRR modelling, in which the retention of any given compound is predicted using only the most chromatographically similar compounds in the available dataset. To this end, each compound in the dataset was sequentially removed from the database and individually utilized as a test analyte. In this study, we propose the retention factor k as the most relevant chromatographic similarity measure and compare it with the Tanimoto index, the most popular similarity measure based on chemical structure. Prediction error was reduced by up to 8 fold when QSRR was based only on chromatographically similar compounds rather than using the entire dataset. The study therefore shows that the design of a practically useful structural similarity index should select the same compounds in the dataset as does the k-similarity filter in order to establish accurate predictive localized QSRR models. While low average prediction errors (Mean Absolute Error (MAE)<0.5min) and slopes of the regression lines through the origin close to 1.00 were obtained using k-similarity searching, the use of the structural Tanimoto similarity index, considered as the gold standard in Quantitative Structure-Activity Relationships (QSAR) studies, generally resulted in much higher prediction errors (MAE>1min) and significant deviations from the reference slope of 1.0. The Tanomoto similarity index therefore appears to have limited general utility in QSRR studies. Future studies therefore aim at designing a more appropriate chromatographic similarity index that can then be applied for unknown compounds (that is, compounds which have not been tested previously on the chromatographic system used, but for which the chemical structures are known).
定量结构-保留关系(QSRR)有潜力加速色谱方法开发的筛选阶段,因为最初的探索性实验被仅基于分子结构的分析物保留预测所取代。本研究进一步提供了局部QSRR建模的概念验证,其中仅使用可用数据集中色谱上最相似的化合物来预测任何给定化合物的保留。为此,数据集中的每个化合物依次从数据库中移除,并单独用作测试分析物。在本研究中,我们提出保留因子k作为最相关的色谱相似性度量,并将其与Tanimoto指数进行比较,Tanimoto指数是基于化学结构的最流行的相似性度量。当QSRR仅基于色谱上相似的化合物而非使用整个数据集时,预测误差降低了多达8倍。因此,该研究表明,为了建立准确的预测性局部QSRR模型,实用的结构相似性指数的设计应在数据集中选择与k相似性过滤器相同的化合物。虽然使用k相似性搜索获得了较低的平均预测误差(平均绝对误差(MAE)<0.5分钟)且通过原点的回归线斜率接近1.00,但在定量构效关系(QSAR)研究中被视为金标准的结构Tanimoto相似性指数的使用通常导致更高的预测误差(MAE>1分钟)以及与参考斜率1.0的显著偏差。因此,Tanimoto相似性指数在QSRR研究中的一般实用性似乎有限。因此,未来的研究旨在设计一种更合适的色谱相似性指数,然后可将其应用于未知化合物(即,先前未在所用色谱系统上测试过但其化学结构已知的化合物)。