Department of Biopharmaceutics and Pharmacodynamics, Medical University of Gdańsk, al. Gen. J. Hallera 107, Gdańsk, 80-416, Poland.
Anal Bioanal Chem. 2022 May;414(11):3471-3481. doi: 10.1007/s00216-022-03968-x. Epub 2022 Mar 28.
Chromatographic retention times are usually modeled considering only one analyte at a time. However, it has certain limitations as no information is shared between the analytes, and consequently the model predictions poorly generalize to out-of-sample analytes. In this work, a publicly available dataset was used to illustrate the benefits of pooling the individual data and analyzing them simultaneously utilizing Bayesian hierarchical approach. Statistical analysis was carried out using the Stan program coupled with R, which enables full Bayesian inference with Markov chain Monte Carlo sampling. This methodology allows (i) incorporating prior knowledge about the likely values of model parameters, (ii) considering the between-analyte variability and the correlation between the model parameters, (iii) explaining the between-analyte variability by available predictors, and (iv) sharing information across the analytes. The latter is especially valuable when only limited information is available in the data about certain model parameters. The results are obtained in the form of posterior probability distribution, which quantifies uncertainty about the model parameters and predictions. Posterior probability is also directly relevant for decision-making. In this work, we used the Neue model to describe the relationship between retention factor and acetonitrile content in the mobile phase for 1026 analytes. The model was parametrized in terms of retention factor in 100% water, retention factor in 100% acetonitrile, and curvature coefficient, and considered log P and pK as predictors. From this analysis, we discovered that the analytes formed two clusters with different retention depending on the degree of analyte dissociation. The final model turned out to be well calibrated with the data. It gives insight into the behavior of analytes in the chromatographic column and can be used to make predictions for a structurally diverse set of analytes if their log P and pK values are known.
色谱保留时间通常仅考虑一次一个分析物进行建模。然而,由于分析物之间没有共享信息,因此该模型的预测结果很难推广到样本外的分析物,因此存在一定的局限性。在这项工作中,使用了一个公开可用的数据集来说明将单个数据汇总并利用贝叶斯层次方法同时分析这些数据的好处。使用与 R 耦合的 Stan 程序进行了统计分析,这使得可以使用马尔可夫链蒙特卡罗抽样进行完整的贝叶斯推断。该方法允许:(i)包含关于模型参数可能值的先验知识,(ii)考虑分析物之间的变异性和模型参数之间的相关性,(iii)用可用的预测因子解释分析物之间的变异性,以及(iv)在分析物之间共享信息。当关于某些模型参数的数据中可用的信息量有限时,这最后一点尤其有价值。结果以后验概率分布的形式获得,该分布量化了模型参数和预测的不确定性。后验概率也直接与决策相关。在这项工作中,我们使用 Neue 模型来描述 1026 个分析物在流动相中的保留因子与乙腈含量之间的关系。该模型根据 100%水中的保留因子、100%乙腈中的保留因子和曲率系数进行参数化,并考虑了 log P 和 pK 作为预测因子。从这项分析中,我们发现分析物形成了两个具有不同保留的簇,这取决于分析物离解的程度。最终模型与数据拟合良好。它深入了解了分析物在色谱柱中的行为,如果已知它们的 log P 和 pK 值,它可以用于对结构多样的分析物进行预测。