Shi Zhenqi, Yi Yuyan, Madrigal Eddie, Hrovat Frank, Zhang Kelly, Lin Jessica
Synthetic Molecule Pharmaceutical Science, gRED, Genentech, Inc., 1 DNA Way, South San Francisco, CA, 94080, United States.
Synthetic Molecule Pharmaceutical Science, gRED, Genentech, Inc., 1 DNA Way, South San Francisco, CA, 94080, United States.
J Chromatogr A. 2025 Feb 8;1742:465628. doi: 10.1016/j.chroma.2024.465628. Epub 2024 Dec 30.
Quantitative structure retention relation (QSRR) is an active field of research, primarily focused on predicting chromatography retention time (Rt) based on molecular structures of an input analyte on a single or limited number of reversed-phase HPLC (RP-HPLC) columns. However, in the pharmaceutical chemistry manufacturing and controls (CMC) settings, single-column QSRR models are often insufficient. It is important to translate retention time across different HPLC methods, specifically different stationary phases (SP) and mobile phases (MP), to guide the HPLC method development, and to bridge organic impurity profiles across different development phases and laboratories. In response to this need, we present a novel approach for retention time transfer across SPs and MPs, without requiring pre-existing Rt data on the target column. To achieve this, we developed an RP-HPLC based Genentech Multi-column Retention Time (GMCRT) database containing 51 small molecule pharmaceutical compounds analyzed on twenty SPs and multiple pH levels. The database incorporated the SP selectivity parameters from Hydrophobic Subtraction Model (HSM) - hydrophobicity (H), steric hindrance (S), hydrogen-bond acidity (A), hydrogen-bond basicity (B), ionic interaction (C) under two different pHs (2.8 and 7) and ethylbenzene (EB) retention factor. Two machine learning approaches, partial least squares (PLS) and artificial neural networks (ANN) were found to improve accuracy of Rt prediction on new SPs compared to the direct mapping approach that have been previously published, especially when the RP-HPLC columns have significant selectivity difference. As a comparison, our approach does not require pre-existing retention data on the target SPs and it is generalizable to any RP-HPLC columns with a set of known column selectivity parameters (https://www.hplccolumns.org/). The generalizability is achievable not only via the available retention data correlation among the twenty commonly-used RP-HPLC columns in GMCRT, but also via the retrainable mechanism of our ML models by adding Rt of the compounds of interest on the source columns into GMCRT, followed by predicting Rt on the target column. Thus, we propose a new QSRR framework that incorporates the physiochemical properties of SPs and MPs and makes the retention time prediction transferable across SPs and MPs. Such a framework is expected to open up possibilities for developing more comprehensive and generalizable models, and streamline RP-HPLC method development and lifecycle management across various pharmaceutical CMC development phases.
定量结构保留关系(QSRR)是一个活跃的研究领域,主要专注于基于输入分析物的分子结构在单个或有限数量的反相高效液相色谱(RP-HPLC)柱上预测色谱保留时间(Rt)。然而,在药物化学制造与控制(CMC)环境中,单柱QSRR模型往往是不够的。跨不同HPLC方法,特别是不同固定相(SP)和流动相(MP)来转换保留时间,对于指导HPLC方法开发以及在不同开发阶段和实验室之间建立有机杂质谱很重要。为满足这一需求,我们提出了一种跨SP和MP进行保留时间转移的新方法,无需目标柱上预先存在的Rt数据。为实现这一目标,我们开发了一个基于RP-HPLC的基因泰克多柱保留时间(GMCRT)数据库,其中包含在二十种SP和多个pH水平下分析的51种小分子药物化合物。该数据库纳入了疏水减法模型(HSM)的SP选择性参数——疏水性(H)、空间位阻(S)、氢键酸度(A)、氢键碱度(B)、在两种不同pH值(2.8和7)下的离子相互作用(C)以及乙苯(EB)保留因子。与先前发表的直接映射方法相比,发现两种机器学习方法,即偏最小二乘法(PLS)和人工神经网络(ANN),能够提高在新SP上Rt预测的准确性,尤其是当RP-HPLC柱具有显著的选择性差异时。相比之下,我们的方法不需要目标SP上预先存在的保留数据,并且可以推广到任何具有一组已知柱选择性参数的RP-HPLC柱(https://www.hplccolumns.org/)。这种可推广性不仅可以通过GMCRT中二十种常用RP-HPLC柱之间可用的保留数据相关性来实现,还可以通过我们的机器学习模型的可重新训练机制来实现,即将感兴趣化合物在源柱上的Rt添加到GMCRT中,然后在目标柱上预测Rt。因此,我们提出了一个新的QSRR框架,该框架纳入了SP和MP的物理化学性质,并使保留时间预测能够跨SP和MP进行转移。这样一个框架有望为开发更全面、更具通用性的模型开辟可能性,并简化跨各种药物CMC开发阶段的RP-HPLC方法开发和生命周期管理。