Suppr超能文献

结合库尔贝克-莱布勒描述符散度分析的高维描述符空间中的贝叶斯相似性搜索。

Bayesian similarity searching in high-dimensional descriptor spaces combined with Kullback-Leibler descriptor divergence analysis.

作者信息

Vogt Martin, Bajorath Jürgen

机构信息

Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstrasse 2, D-53113 Bonn, Germany.

出版信息

J Chem Inf Model. 2008 Feb;48(2):247-55. doi: 10.1021/ci700333t. Epub 2008 Jan 30.

Abstract

We investigate an approach that combines Bayesian modeling of probability distributions of descriptor values of active and database molecules with Kullback-Leibler analysis of the divergence between these distributions. The methodology is used for Bayesian screening and also to predict compound recall rates. In our study, we analyze two fundamental approximations underlying the Bayesian screening approach: the assumption that descriptors are independent of each other and, furthermore, that their data set values follow normal distributions. In addition, we calculate Kullback-Leibler divergence for single descriptors, rather than multiple-feature distributions, in order to prioritize descriptors for screening calculations. The results show that descriptor correlation effects, violating the assumption of feature independence, can lead to notable reduction of compound recall in Bayesian screening. Controlling descriptor correlation effects play a much more significant role for achieving high recall rates than approximating descriptor distributions by Gaussians. Furthermore, Kullback-Leibler divergence analysis is shown to systematically identify descriptors that are the most relevant for the outcome of Bayesian screening calculations.

摘要

我们研究了一种方法,该方法将活性分子和数据库分子描述符值的概率分布的贝叶斯建模与这些分布之间差异的库尔贝克-莱布勒分析相结合。该方法用于贝叶斯筛选,也用于预测化合物召回率。在我们的研究中,我们分析了贝叶斯筛选方法背后的两个基本近似:描述符相互独立的假设,以及此外它们的数据集值遵循正态分布的假设。此外,我们计算单个描述符的库尔贝克-莱布勒散度,而不是多特征分布的散度,以便为筛选计算确定描述符的优先级。结果表明,违反特征独立性假设的描述符相关效应会导致贝叶斯筛选中化合物召回率显著降低。控制描述符相关效应对于实现高召回率比用高斯分布近似描述符分布起着更为重要的作用。此外,库尔贝克-莱布勒散度分析被证明可以系统地识别与贝叶斯筛选计算结果最相关的描述符。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验