Taraji Maryam, Haddad Paul R, Amos Ruth I J, Talebi Mohammad, Szucs Roman, Dolan John W, Pohl Christopher A
Australian Centre for Research on Separation Science (ACROSS), School of Physical Sciences-Chemistry, University of Tasmania, Private Bag 75, Hobart 7001, Australia.
Australian Centre for Research on Separation Science (ACROSS), School of Physical Sciences-Chemistry, University of Tasmania, Private Bag 75, Hobart 7001, Australia.
J Chromatogr A. 2017 Jul 21;1507:53-62. doi: 10.1016/j.chroma.2017.05.044. Epub 2017 May 23.
The development of quantitative structure retention relationships (QSRR) having sufficient accuracy to support high performance liquid chromatography (HPLC) method development is still a major issue. To tackle this challenge, this study presents a novel QSRR methodology to select a training set of compounds for QSRR modelling (i.e. to filter the database to identify the most appropriate compounds for the training set). This selection is based on a dual filtering strategy which combines Tanimoto similarity (TS) searching as the primary filter and retention time (t) similarity clustering as the secondary filter, using a database of pharmaceutical compound retention times collected over a wide range of hydrophilic interaction liquid chromatography (HILIC) systems. To employ t similarity filtering, correlation to a molecular descriptor is used as a measure of retention time. For the retention time of a compound to be modelled a relationship between experimental chromatographic data and various molecular descriptors is calculated using a genetic algorithm-partial least squares (GA-PLS) regression. The proposed dual-filtering-based QSRR model significantly improves the retention time predictability compared to the diverse, global, and TS-based QSRR models, with an average root mean square error in prediction (RMSEP) of 11.01% over five different HILIC stationary phases. The average CPU time for implementing the proposed approach is less than 10min, which makes it quite favorable for rapid method development in HILIC. In addition, interpretation of the molecular descriptors selected by this novel approach provided some insight into the HILIC mechanism.
开发具有足够准确性以支持高效液相色谱(HPLC)方法开发的定量结构保留关系(QSRR)仍然是一个主要问题。为应对这一挑战,本研究提出了一种新颖的QSRR方法,用于选择用于QSRR建模的化合物训练集(即对数据库进行筛选,以识别训练集中最合适的化合物)。这种选择基于双重筛选策略,该策略将Tanimoto相似性(TS)搜索作为主要筛选器,并将保留时间(t)相似性聚类作为次要筛选器,使用在广泛的亲水相互作用液相色谱(HILIC)系统中收集的药物化合物保留时间数据库。为了采用t相似性筛选,将与分子描述符的相关性用作保留时间的度量。对于要建模的化合物的保留时间,使用遗传算法-偏最小二乘法(GA-PLS)回归计算实验色谱数据与各种分子描述符之间的关系。与多样的、全局的和基于TS的QSRR模型相比,所提出的基于双重筛选的QSRR模型显著提高了保留时间的可预测性,在五种不同的HILIC固定相上预测的平均均方根误差(RMSEP)为11.01%。实施所提出方法的平均CPU时间少于10分钟,这使其在HILIC中进行快速方法开发方面非常有利。此外,对通过这种新方法选择的分子描述符的解释为HILIC机制提供了一些见解。