Berenger Francois, Vu Oanh, Meiler Jens
Department of Chemistry, Vanderbilt University, Nashville, TN, USA.
Division of System Cohort, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan.
J Cheminform. 2017 Nov 28;9(1):60. doi: 10.1186/s13321-017-0248-5.
In ligand-based virtual screening experiments, a known active ligand is used in similarity searches to find putative active compounds for the same protein target. When there are several known active molecules, screening using all of them is more powerful than screening using a single ligand. A consensus query can be created by either screening serially with different ligands before merging the obtained similarity scores, or by combining the molecular descriptors (i.e. chemical fingerprints) of those ligands.
We report on the discriminative power and speed of several consensus methods, on two datasets only made of experimentally verified molecules. The two datasets contain a total of 19 protein targets, 3776 known active and ~ 2 × 10 inactive molecules. Three chemical fingerprints are investigated: MACCS 166 bits, ECFP4 2048 bits and an unfolded version of MOLPRINT2D. Four different consensus policies and five consensus sizes were benchmarked.
The best consensus method is to rank candidate molecules using the maximum score obtained by each candidate molecule versus all known actives. When the number of actives used is small, the same screening performance can be approached by a consensus fingerprint. However, if the computational exploration of the chemical space is limited by speed (i.e. throughput), a consensus fingerprint allows to outperform this consensus of scores.
在基于配体的虚拟筛选实验中,已知的活性配体用于相似性搜索,以寻找针对同一蛋白质靶点的潜在活性化合物。当有多个已知活性分子时,使用所有分子进行筛选比使用单个配体进行筛选更有效。可以通过在合并获得的相似性得分之前用不同配体依次筛选,或者通过组合这些配体的分子描述符(即化学指纹)来创建共识查询。
我们报告了几种共识方法在仅由实验验证分子组成的两个数据集上的判别能力和速度。这两个数据集总共包含19个蛋白质靶点、3776个已知活性分子和约2×10个非活性分子。研究了三种化学指纹:MACCS 166位、ECFP4 2048位和MOLPRINT2D的展开版本。对四种不同的共识策略和五种共识规模进行了基准测试。
最佳的共识方法是使用每个候选分子相对于所有已知活性分子获得的最大得分对候选分子进行排名。当使用的活性分子数量较少时,通过共识指纹可以达到相同的筛选性能。然而,如果化学空间的计算探索受到速度(即通量)的限制,共识指纹的表现会优于得分共识。