Global Discovery Chemistry, Novartis Institutes for Biomedical Research, Basel, Switzerland.
PLoS One. 2013 Apr 16;8(4):e61007. doi: 10.1371/journal.pone.0061007. Print 2013.
The biochemical half maximal inhibitory concentration (IC50) is the most commonly used metric for on-target activity in lead optimization. It is used to guide lead optimization, build large-scale chemogenomics analysis, off-target activity and toxicity models based on public data. However, the use of public biochemical IC50 data is problematic, because they are assay specific and comparable only under certain conditions. For large scale analysis it is not feasible to check each data entry manually and it is very tempting to mix all available IC50 values from public database even if assay information is not reported. As previously reported for Ki database analysis, we first analyzed the types of errors, the redundancy and the variability that can be found in ChEMBL IC50 database. For assessing the variability of IC50 data independently measured in two different labs at least ten IC50 data for identical protein-ligand systems against the same target were searched in ChEMBL. As a not sufficient number of cases of this type are available, the variability of IC50 data was assessed by comparing all pairs of independent IC50 measurements on identical protein-ligand systems. The standard deviation of IC50 data is only 25% larger than the standard deviation of Ki data, suggesting that mixing IC50 data from different assays, even not knowing assay conditions details, only adds a moderate amount of noise to the overall data. The standard deviation of public ChEMBL IC50 data, as expected, resulted greater than the standard deviation of in-house intra-laboratory/inter-day IC50 data. Augmenting mixed public IC50 data by public Ki data does not deteriorate the quality of the mixed IC50 data, if the Ki is corrected by an offset. For a broad dataset such as ChEMBL database a Ki- IC50 conversion factor of 2 was found to be the most reasonable.
生化半数最大抑制浓度(IC50)是在先导化合物优化中最常用于靶标活性的指标。它用于指导先导化合物优化,基于公共数据构建大规模的化学生物基因组分析、非靶标活性和毒性模型。然而,使用公共生化 IC50 数据存在问题,因为它们是特定于检测方法的,并且仅在某些条件下才具有可比性。对于大规模分析,手动检查每个数据条目是不可行的,因此即使没有报告检测信息,也非常诱人将所有可用的公共数据库中的 IC50 值混合在一起。正如之前对 Ki 数据库分析的报道,我们首先分析了 ChEMBL IC50 数据库中可能存在的错误类型、冗余性和可变性。为了评估在两个不同实验室独立测量的 IC50 数据的可变性,在 ChEMBL 中搜索了至少十个相同蛋白配体系统针对相同靶标在两个不同实验室中独立测量的 IC50 数据。由于这种情况的数量不足,因此通过比较相同蛋白配体系统的所有独立 IC50 测量值来评估 IC50 数据的可变性。IC50 数据的标准偏差仅比 Ki 数据的标准偏差大 25%,这表明即使不知道检测条件的细节,混合来自不同检测方法的 IC50 数据只会给整体数据增加适度的噪声。正如预期的那样,公共 ChEMBL IC50 数据的标准偏差大于内部实验室/日间 IC50 数据的标准偏差。如果 Ki 通过偏移进行校正,则混合公共 Ki 数据不会降低混合 IC50 数据的质量。对于像 ChEMBL 数据库这样广泛的数据集,发现 Ki-IC50 转换因子为 2 是最合理的。