Lhasa Limited, Granary Wharf House, 2 Canal Wharf, LeedsLS11 5PS, United Kingdom.
Chem Res Toxicol. 2022 Nov 21;35(11):1997-2013. doi: 10.1021/acs.chemrestox.2c00199. Epub 2022 Oct 27.
The discovery of carcinogenic nitrosamine impurities above the safe limits in pharmaceuticals has led to an urgent need to develop methods for extending structure-activity relationship (SAR) analyses from relatively limited datasets, while the level of confidence required in that SAR indicates that there is significant value in investigating the effect of individual substructural features in a statistically robust manner. This is a challenging exercise to perform on a small dataset, since in practice, compounds contain a mixture of different features, which may confound both expert SAR and statistical quantitative structure-activity relationship (QSAR) methods. Isolating the effects of a single structural feature is made difficult due to the confounding effects of other functionality as well as issues relating to determining statistical significance in cases of concurrent statistical tests of a large number of potential variables with a small dataset; a naïve QSAR model does not predict any features to be significant after correction for multiple testing. We propose a variation on Bayesian multiple linear regression to estimate the effects of each feature simultaneously yet independently, taking into account the combinations of features present in the dataset and reducing the impact of multiple testing, showing that some features have a statistically significant impact. This method can be used to provide statistically robust validation of expert SAR approaches to the differences in potency between different structural groupings of nitrosamines. Structural features that lead to the highest and lowest carcinogenic potency can be isolated using this method, and novel nitrosamine compounds can be assigned into potency categories with high accuracy.
在药品中发现致癌亚硝胺杂质超过安全限量,这就迫切需要开发方法,将结构-活性关系(SAR)分析从相对有限的数据集扩展,而 SAR 所需的置信度表明,以统计上稳健的方式研究单个亚结构特征的效果具有重要价值。在小数据集上执行此操作具有挑战性,因为在实践中,化合物包含不同特征的混合物,这可能会混淆专家 SAR 和统计定量结构-活性关系(QSAR)方法。由于其他功能的混杂效应以及在对具有小数据集的大量潜在变量进行大量并发统计检验的情况下确定统计显着性的问题,很难分离单个结构特征的影响;天真的 QSAR 模型在进行多次检验后,不会预测任何特征具有显着性。我们提出了贝叶斯多元线性回归的一种变体,以便同时且独立地估计每个特征的效应,同时考虑到数据集中存在的特征组合,并减少多次检验的影响,表明某些特征具有统计学意义。该方法可用于对不同亚硝胺结构分组之间的效力差异提供专家 SAR 方法的统计稳健验证。可以使用这种方法分离导致最高和最低致癌效力的结构特征,并且可以将新型亚硝胺化合物准确地分配到效力类别中。