Department of Chemistry, Wayne State University, Detroit, MI, 48202, USA.
J Chromatogr A. 2023 Mar 15;1692:463851. doi: 10.1016/j.chroma.2023.463851. Epub 2023 Feb 8.
The distribution of neutral compounds in biphasic separation systems can be described by the solvation parameter model using six solute properties, or descriptors. These descriptors characterize the size (McGowan's characteristic volume), V, excess molar refraction, E, dipolarity/polarizability, S, hydrogen-bond acidity and basicity, A and B, and the gas-liquid partition constant on n-hexadecane at 298.15 K, L. McGowan's characteristic volume and the excess molar refraction for liquids are available by calculation (E requires and experimental refractive index). The other descriptors and excess molar refraction for solids are experimental quantities and subject to greater variation or are estimated using computational or empirical models. Solute descriptors for several thousand compounds are available in the Abraham descriptor database and for several hundred compounds in the WSU descriptor database. These publicly accessible databases were developed independently using different approaches and for many compounds provide different descriptor values. In this report we evaluate the effect of mixing descriptors from the two databases on modeling chromatographic retention factors and liquid-liquid partition constants. It is shown that the two descriptor databases are not interchangeable. The WSU descriptor database consistently demonstrates improved model quality as determined by statistical parameters. Model system constants exhibit a general dependence on database selection with an approximately linear trend as a function of the fraction of compounds assigned descriptors from either database. There is no general model performance advantage to using mixed descriptor datasets and no real cause for concern for relatively large datasets containing < 15 % of compounds with descriptors assigned from the other database. For small datasets, descriptor quality is an important variable for adequate model performance.
双相分离体系中中性化合物的分布可以用溶剂化参数模型来描述,该模型使用六个溶质性质或描述符。这些描述符表征了大小(麦高恩特征体积)、V、过剩摩尔折射度、E、偶极矩/极化率、S、氢键酸度和碱度、A 和 B,以及 298.15 K 下正十六烷的气液分配常数、L。麦高恩特征体积和液体的过剩摩尔折射度可通过计算(E 需要实验折射率)获得。其他描述符和固体的过剩摩尔折射度是实验量,变化较大,或者使用计算或经验模型进行估算。几千种化合物的溶质描述符可在 Abraham 描述符数据库中获得,几百种化合物的描述符可在 WSU 描述符数据库中获得。这些公开的数据库是使用不同的方法独立开发的,并且为许多化合物提供了不同的描述符值。在本报告中,我们评估了混合两个数据库中的描述符对色谱保留因子和液液分配常数建模的影响。结果表明,这两个描述符数据库不能互换。WSU 描述符数据库始终表现出更好的模型质量,这可以通过统计参数来确定。模型系统常数表现出对数据库选择的一般依赖性,作为来自两个数据库中描述符的化合物分数的函数呈近似线性趋势。使用混合描述符数据集没有一般的模型性能优势,对于包含 <15%的化合物从另一个数据库分配了描述符的相对较大的数据集,也没有真正的理由担心。对于小数据集,描述符质量是获得足够模型性能的一个重要变量。