Theertham Bhargav, Wang Jenna L, Fang Jianwen, Lushington Gerald H
Center of Excellence for Chemical Methodology and Library Development, The University of Kansas, Lawrence, KS 66045, USA.
Curr Comput Aided Drug Des. 2008 Mar 1;4(1):23-34. doi: 10.2174/157340908783769292.
Despite great advances in the efficiency of analytical and synthetic chemistry, time and available starting material still limit the number of unique compounds that can be practically synthesized and evaluated as prospective therapeutics. Chemical diversity analysis (the capacity to identify finite diverse subsets that reliably represent greater manifolds of drug-like chemicals) thus remains an important resource in drug discovery. Despite an unproven track record, chemical diversity has also been used to posit, from preliminary screen hits, new compounds with similar or better activity. Identifying diversity metrics that demonstrably encode bioactivity trends is thus of substantial potential value for intelligent assembly of targeted screens. This paper reports novel algorithms designed to simultaneously reflect chemical similarity or diversity trends and apparent bioactivity in compound collections. An extensive set of descriptors are evaluated within large NCI screening data sets according to bioactivity differentiation capacities, quantified as the ability to co-localize known active species into bioactive-rich K-means clusters. One method tested for descriptor selection orders features according to relative variance across a set of training compounds, and samples increasingly finer subset meshes for descriptors whose exclusion from the model induces drastic drops in relative bioactive colocalization. This yields metrics with reasonable bioactive enrichment (greater than 50% of all bioactive compounds collected into clusters or cells with significantly enriched active/inactive rates) for each of the four data sets examined herein. A second method replaces variance by an active/inactive divergence score, achieving comparable enrichment via a much more efficient search process. Combinations of the above metrics are tested in 2D rectilinear diversity models, achieving similarly successful colocalization statistics, with metrics derived from the active/inactive divergence score typically outperforming those selected from the variance criterion and computed from the DiverseSolutions software.
尽管分析化学和合成化学的效率有了巨大进步,但时间和可用的起始原料仍然限制了能够实际合成并作为潜在治疗药物进行评估的独特化合物的数量。因此,化学多样性分析(识别有限的不同子集以可靠地代表更大的类药物化学物质集合的能力)仍然是药物发现中的一项重要资源。尽管化学多样性的记录未经证实,但它也被用于从初步筛选命中结果中推测具有相似或更好活性的新化合物。因此,识别能够明显编码生物活性趋势的多样性指标对于有针对性筛选的智能组装具有重大潜在价值。本文报告了旨在同时反映化合物库中化学相似性或多样性趋势以及明显生物活性的新算法。根据生物活性区分能力,在大型美国国立癌症研究所(NCI)筛选数据集中评估了大量描述符,生物活性区分能力量化为将已知活性物质共定位到富含生物活性的K均值聚类中的能力。测试的一种描述符选择方法根据一组训练化合物的相对方差对特征进行排序,并针对那些从模型中排除会导致相对生物活性共定位急剧下降的描述符,对越来越精细的子集网格进行采样。对于本文研究的四个数据集中的每一个,这都产生了具有合理生物活性富集的指标(超过50%的所有生物活性化合物被收集到活性/非活性率显著富集的聚类或细胞中)。第二种方法用活性/非活性差异分数代替方差,通过更高效的搜索过程实现了可比的富集。在二维直线多样性模型中测试了上述指标的组合,实现了类似成功的共定位统计,从活性/非活性差异分数导出的指标通常优于从方差标准中选择并由DiverseSolutions软件计算的指标。