Suppr超能文献

基于大数据集的反相液相色谱选择性改进的疏水模型,重点关注异构体选择性。

Improved hydrophobic subtraction model of reversed-phase liquid chromatography selectivity based on a large dataset with a focus on isomer selectivity.

机构信息

Department of Chemistry, Virginia Commonwealth University, Box 842006, Richmond, VA 23284-2006, USA.

Department of Chemistry, Gustavus Adolphus College, 800 W. College Ave., St. Peter, MN 56082, USA.

出版信息

J Chromatogr A. 2024 Aug 30;1731:465127. doi: 10.1016/j.chroma.2024.465127. Epub 2024 Jun 29.

Abstract

Reversed-phase (RP) liquid chromatography is an important tool for the characterization of materials and products in the pharmaceutical industry. Method development is still challenging in this application space, particularly when dealing with closely-related compounds. Models of chromatographic selectivity are useful for predicting which columns out of the hundreds that are available are likely to have very similar, or different, selectivity for the application at hand. The hydrophobic subtraction model (HSM1) has been widely employed for this purpose; the column database for this model currently stands at 750 columns. In previous work we explored a refinement of the original HSM1 (HSM2) and found that increasing the size of the dataset used to train the model dramatically reduced the number of gross errors in predictions of selectivity made using the model. In this paper we describe further work in this direction (HSM3), this time based on a much larger solute set (1014 solute/stationary phase combinations) containing selectivities for compounds covering a broader range of physicochemical properties compared to HSM1. The molecular weight range was doubled, and the range of the logarithm of the octanol/water partition coefficients was increased slightly. The number of active pharmaceutical ingredients and related synthetic intermediates and impurities was increased from four to 28, and ten pairs of closely related structures (e.g., geometric and cis-/trans- isomers) were included. The HSM3 model is based on retention measurements for 75 compounds using 13 RP stationary phases and a mobile phase of 40/60 acetonitrile/25 mM ammonium formate buffer at pH 3.2. This data-driven model produced predictions of ln α (chromatographic selectivity using ethylbenzene as the reference compound) with average absolute errors of approximately 0.033, which corresponds to errors in α of about 3 %. In some cases, the prediction of the trans-/cis- selectivities for positional and geometric isomers was relatively accurate, and the driving forces for the observed selectivity could be inferred by examination of the relative magnitudes of the terms in the HSM3 model. For some geometric isomer pairs the interactions mainly responsible for the observed selectivities could not be rationalized due to large uncertainties for particular terms in the model. This suggests that more work is needed in the future to explore other HSM-type models and continue expanding the training dataset in order to continue improving the predictive accuracy of these models. Additionally, we release with this paper a much larger data set (43,329 total retention measurements) at multiple mobile phase compositions, to enable other researchers to pursue their own lines of inquiry related to RP selectivity.

摘要

反相(RP)液相色谱是制药行业中用于材料和产品表征的重要工具。在这个应用领域,方法开发仍然具有挑战性,特别是在处理密切相关的化合物时。色谱选择性模型可用于预测在手头的应用中,数百种可用的色谱柱中哪些具有非常相似或不同的选择性。疏水扣除模型(HSM1)已广泛用于此目的;该模型的色谱柱数据库目前为 750 根。在之前的工作中,我们探索了原始 HSM1(HSM2)的改进,发现使用模型进行选择性预测时,用于训练模型的数据集中的大小显著增加,大大减少了粗差的数量。在本文中,我们描述了这方面的进一步工作(HSM3),这一次是基于更大的溶质集(1014 个溶质/固定相组合),与 HSM1 相比,该溶质集包含了更广泛的物理化学性质的化合物的选择性。分子量范围扩大了一倍,辛醇/水分配系数的对数范围略有增加。活性药物成分及其相关的合成中间体和杂质的数量从 4 增加到 28,并且包括了 10 对密切相关的结构(例如,几何异构体和顺/反异构体)。HSM3 模型基于 75 种化合物在 13 种 RP 固定相上使用 40/60 乙腈/25 mM 甲酸铵缓冲液(pH 3.2)的保留测量值。该数据驱动的模型产生了使用乙基苯作为参考化合物的 lnα(使用乙基苯作为参考化合物的色谱选择性)的预测,平均绝对误差约为 0.033,这对应于α的约 3%的误差。在某些情况下,对位置和几何异构体的 trans-/cis-选择性的预测相对准确,可以通过检查 HSM3 模型中各术语的相对大小来推断观察到的选择性的驱动力。对于一些几何异构体对,由于模型中某些术语的不确定性较大,无法合理化导致观察到的选择性的相互作用主要因素。这表明未来需要做更多的工作来探索其他 HSM 类型的模型,并继续扩大训练数据集,以继续提高这些模型的预测准确性。此外,我们在本文中还发布了一个更大的数据集(总保留测量值为 43329 个),其中包含多个移动相组成,以方便其他研究人员开展与 RP 选择性相关的自己的研究。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验