A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, Moscow, GSP-1, 119071, Russia.
A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, 31 Leninsky Prospect, Moscow, GSP-1, 119071, Russia.
J Chromatogr A. 2024 Sep 13;1732:465223. doi: 10.1016/j.chroma.2024.465223. Epub 2024 Aug 2.
Retention indices are values that characterize the retention of a compound in gas chromatography. In practice, retention indices are often assumed to depend only on the structure of the molecule and the type of the stationary phase, but this approximation is incorrect. This study is devoted to studying the dependence of retention indices on the column heating rate in the linear temperature programming mode, using a large and diverse data set. In the NIST 20 database, most data records are recorded in this mode. For stationary phases based on poly(5%-diphenyl-95%-dimethyl)siloxane (5%-phenyl-PDMS), there is a high proportion of records with heating rates of 10-15 K/min. In practice, such a high heating rate is rarely used and the use of such data may cause errors. A search was made for groups of records that were taken from the same primary source, recorded for the same compound and the same stationary phase, but differing in a heating rate. For each of these groups, the value D, the angular coefficient (slope) of the dependence of the retention index on the heating rate, was calculated. This value can take both positive and negative values. The highest values and the greatest variation of D values are observed for polar stationary phases, but further consideration was performed for 5%-phenyl-PDMS due to its greater practical significance. For these stationary phases, the highest D values are observed for aromatic and polyaromatic molecules; oxygen-containing compounds, on the contrary, exhibit lower D values. Negative D values are observed for many trimethylsilyl derivatives. A data set of D values for 756 molecules was selected and published online. There is almost no correlation between D and the retention index, lipophilicity factor logP, and molecular weight. Significant correlations with the number of cycles, the number of rotatable bonds, and the number of aromatic atoms were observed. Linear equations quantitatively relating the molecular descriptors to the D value were constructed. A number of cycles and halogen atoms were shown to contribute positively to the D value, while a number of oxygen atoms and bonds subject to internal rotation contributed negatively. The strong influence of the values related to the conformational rigidity of molecules and the weak influence of polarity allow us to suppose that the entropic factor has a key influence on the D value. A simple empirical linear equation for estimating the value of D is derived and presented in this study. Several machine learning methods for predicting D are compared. The best results are shown by gradient boosting and a random forest. However, the random forest does not achieve high accuracy in predicting the retention indices themselves.
保留指数是描述化合物在气相色谱中保留行为的数值。在实践中,保留指数通常被认为仅取决于分子的结构和固定相的类型,但这种近似是不正确的。本研究致力于研究在线性程序升温模式下保留指数与柱升温速率的关系,使用了大量和多样化的数据集。在 NIST 20 数据库中,大多数数据记录都是以这种模式记录的。对于基于聚(5%-二苯基-95%-二甲基)硅氧烷(5%-苯基-PDMS)的固定相,有很大一部分记录的升温速率为 10-15 K/min。在实践中,很少使用如此高的升温速率,而使用这样的数据可能会导致错误。从同一原始来源中搜索记录组,这些记录是为同一化合物和同一固定相记录的,但升温速率不同。对于这些组中的每一个,都计算了值 D,即保留指数对升温速率的依赖关系的角系数(斜率)。这个值可以是正的也可以是负的。极性固定相观察到的最高值和 D 值的最大变化,但是由于其更实际的意义,进一步考虑了 5%-苯基-PDMS。对于这些固定相,芳香族和多环芳烃分子观察到的 D 值最高;相反,含氧化合物的 D 值较低。许多三甲基硅基衍生物观察到负 D 值。选择并在线发布了 756 个分子的 D 值数据集。D 值与保留指数、亲脂性因子 logP 和分子量几乎没有相关性。观察到与循环数、可旋转键数和芳香原子数的显著相关性。构建了定量关联分子描述符与 D 值的线性方程。循环数和卤原子对 D 值有正贡献,而氧原子和内部旋转键对 D 值有负贡献。分子构象刚性相关值的强烈影响和极性的弱影响表明,熵因子对 D 值有关键影响。本文推导出了一种简单的经验线性方程来估计 D 值。比较了几种用于预测 D 值的机器学习方法。梯度提升和随机森林显示出最好的结果。然而,随机森林在预测保留指数本身时并没有达到很高的准确性。