Broad Institute of Harvard and MIT, 7 Cambridge Center, Cambridge, MA 02142, USA.
Proc Natl Acad Sci U S A. 2011 Apr 26;108(17):6817-22. doi: 10.1073/pnas.1015024108. Epub 2011 Apr 11.
Using a diverse collection of small molecules we recently found that compound sets from different sources (commercial; academic; natural) have different protein-binding behaviors, and these behaviors correlate with trends in stereochemical complexity for these compound sets. These results lend insight into structural features that synthetic chemists might target when synthesizing screening collections for biological discovery. We report extensive characterization of structural properties and diversity of biological performance for these compounds and expand comparative analyses to include physicochemical properties and three-dimensional shapes of predicted conformers. The results highlight additional similarities and differences between the sets, but also the dependence of such comparisons on the choice of molecular descriptors. Using a protein-binding dataset, we introduce an information-theoretic measure to assess diversity of performance with a constraint on specificity. Rather than relying on finding individual active compounds, this measure allows rational judgment of compound subsets as groups. We also apply this measure to publicly available data from ChemBank for the same compound sets across a diverse group of functional assays. We find that performance diversity of compound sets is relatively stable across a range of property values as judged by this measure, both in protein-binding studies and functional assays. Because building screening collections with improved performance depends on efficient use of synthetic organic chemistry resources, these studies illustrate an important quantitative framework to help prioritize choices made in building such collections.
利用多种小分子,我们最近发现,来自不同来源(商业、学术、天然)的化合物集具有不同的蛋白结合行为,这些行为与这些化合物集的立体化学复杂度趋势相关。这些结果为合成化学家在合成用于生物发现的筛选库时可能针对的结构特征提供了深入了解。我们报告了这些化合物的结构性质和生物性能的广泛特征描述,并扩展了比较分析,包括预测构象的物理化学性质和三维形状。结果突出了这些集合之间的更多相似和差异,但也强调了此类比较对分子描述符选择的依赖性。使用蛋白结合数据集,我们引入了一种信息论度量方法,通过对特异性的约束来评估性能多样性。这种方法不是依赖于寻找单个活性化合物,而是允许对化合物子集进行理性判断。我们还将该度量应用于 ChemBank 上公开的来自不同功能测定的相同化合物集的数据。我们发现,根据该度量,化合物集的性能多样性在一系列属性值范围内相对稳定,无论是在蛋白结合研究还是功能测定中。由于构建具有改进性能的筛选库取决于对合成有机化学资源的有效利用,因此这些研究说明了一个重要的定量框架,可以帮助在构建此类集合时优先考虑做出的选择。