Department of Analytical Chemistry, Faculty of Natural Sciences, Comenius University Bratislava, Ilkovičova 6, SK-84215 Bratislava, Slovakia.
Pfizer R&D UK Limited, Ramsgate Road, Sandwich CT13 9NJ, UK.
J Chromatogr A. 2023 Sep 27;1707:464317. doi: 10.1016/j.chroma.2023.464317. Epub 2023 Aug 19.
Quantitative Structure-Retention Relationships offer a valuable tool for de-risking chromatographic methods in relation to newly formed or hypothetical compounds, arising from synthetic processes or formulation activities. They can also be used to identify optimal separation conditions, or in support of structural elucidation. In this contribution, we provide a systematic study of the relationship between the accuracy of the retention model, the size of the training set and its structural similarity to the predicted compound. We compare structural similarity expressed either on a fingerprint basis (e.g., Tanimoto index), or by Euclidean distance calculated from of subset of molecular descriptors. The results presented indicate that accurate and predictive models can be built from a small dataset containing as few as 25 compounds, provided that the training set is structurally similar to the test compound. When the training set contains compounds selected by minimizing the Euclidean distance calculated from 3 descriptors most correlated with the retention time, root mean square error of 0.48 min and correlation coefficient of 0.9464 were observed for the test sets of 104 compounds. Moreover, these models meet the Tropsha predictivity criteria. These findings potentially bring the prediction of retention times within the practical reach of pharmaceutical analysts involved in chromatographic method development. We also present an optimisation approach to select algorithm settings in order to minimize the prediction error and ensure model predictivity.
定量构效关系为色谱方法的风险评估提供了有价值的工具,特别是对于那些来自于合成过程或制剂活动的新形成或假设的化合物。它们还可以用于确定最佳分离条件,或支持结构解析。在本研究中,我们系统地研究了保留模型的准确性、训练集的大小及其与预测化合物的结构相似性之间的关系。我们比较了基于指纹(例如,Tanimoto 指数)或从分子描述符子集计算的欧几里得距离表示的结构相似性。结果表明,即使在训练集中仅包含 25 个化合物的情况下,只要训练集与测试化合物具有结构相似性,也可以建立准确且具有预测能力的模型。当训练集包含通过最小化与保留时间最相关的 3 个描述符计算得出的欧几里得距离选择的化合物时,对于包含 104 个化合物的测试集,观察到的均方根误差为 0.48 分钟,相关系数为 0.9464。此外,这些模型满足 Tropsha 预测性标准。这些发现有可能使药物分析人员在色谱方法开发中实现保留时间的预测。我们还提出了一种优化方法来选择算法设置,以最小化预测误差并确保模型的预测能力。