Liu Ruifeng, AbdulHameed Mohamed Diwan M, Wallqvist Anders
Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Development Command, Fort Detrick, MD, United States.
The Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, MD, United States.
Front Chem. 2019 Oct 23;7:701. doi: 10.3389/fchem.2019.00701. eCollection 2019.
High throughput screening (HTS) is an important component of lead discovery, with virtual screening playing an increasingly important role. Both methods typically suffer from lack of sensitivity and specificity against their true biological targets. With ever-increasing screening libraries and virtual compound collections, it is now feasible to conduct follow-up experimental testing on only a small fraction of hits. In this context, advances in virtual screening that achieve enrichment of true actives among top-ranked compounds ("early recognition") and, hence, reduce the number of hits to test, are highly desirable. The standard ligand-based virtual screening method for large compound libraries uses a molecular similarity search method that ranks the likelihood of a compound to be active against a drug target by its highest Tanimoto similarity to known active compounds. This approach assumes that the distributions of Tanimoto similarity values to all active compounds are identical (i.e., same mean and standard deviation)-an assumption shown to be invalid (Baldi and Nasr, 2010). Here, we introduce two methods that improve early recognition of actives by exploiting similarity information of all molecules. The first method ranks a compound by its highest z-score instead of its highest Tanimoto similarity, and the second by an aggregated score calculated from its Tanimoto similarity values to all known actives and inactives (or a large number of structurally diverse molecules when information on inactives is unavailable). Our evaluations, which use datasets of over 20 HTS campaigns downloaded from PubChem, indicate that compared to the conventional approach, both methods achieve a ~10% higher Boltzmann-enhanced discrimination of receiver operating characteristic (BEDROC) score-a metric of early recognition. Given the increasing use of virtual screening in early lead discovery, these methods provide straightforward means to enhance early recognition.
高通量筛选(HTS)是先导化合物发现的重要组成部分,虚拟筛选的作用日益重要。这两种方法通常都存在对其真正生物学靶点缺乏敏感性和特异性的问题。随着筛选文库和虚拟化合物库的不断增加,现在仅对一小部分命中化合物进行后续实验测试是可行的。在这种情况下,非常需要在虚拟筛选方面取得进展,以实现对排名靠前的化合物中的真正活性物质进行富集(“早期识别”),从而减少需要测试的命中化合物数量。用于大型化合物库的基于配体的标准虚拟筛选方法使用分子相似性搜索方法,该方法根据化合物与已知活性化合物的最高Tanimoto相似性对其对药物靶点具有活性的可能性进行排名。这种方法假设与所有活性化合物的Tanimoto相似性值的分布是相同的(即相同的均值和标准差)——这一假设已被证明是无效的(Baldi和Nasr,2010年)。在这里,我们介绍两种方法,通过利用所有分子的相似性信息来提高对活性物质的早期识别。第一种方法根据化合物的最高z分数而不是最高Tanimoto相似性对其进行排名,第二种方法根据其与所有已知活性和非活性化合物(或在没有非活性信息时与大量结构多样的分子)的Tanimoto相似性值计算的综合分数进行排名。我们使用从PubChem下载的20多个高通量筛选活动的数据集进行的评估表明,与传统方法相比,这两种方法的玻尔兹曼增强型接收器操作特征(BEDROC)得分都提高了约10%——这是早期识别的一个指标。鉴于虚拟筛选在早期先导化合物发现中的使用越来越多,这些方法提供了增强早期识别的直接手段。