Fukunishi Yoshifumi
Biomedicinal Information Research Center, National Institute of Advanced Industrial Science and Technology, 2-41-6 Aomi, Koto-ku, Tokyo, Japan.
Comb Chem High Throughput Screen. 2009 May;12(4):397-408. doi: 10.2174/138620709788167890.
The initial stage of drug development is the hit (active) compound search from a pool of millions of compounds; for this process, in silico (virtual) screening has been successfully applied. One of the problems of in silico screening, however, is the low hit ratio in relation to the high computational cost and the long CPU time. This problem becomes serious in structure-based in silico screening. The major reason is the low accuracy of the estimation of protein-compound binding free energy. The problem of ligand-based in silico screening is that the conventional quantitative structure-activity relationship (QSAR) approach is not effective at predicting new hit compounds with new scaffolds. Recently, machine-learning approaches have been applied to in silico drug screening to overcome the above problems. We review here machine-learning approaches for both structure-based and ligand-based drug screening. Machine learning is used to improve database enrichment in two ways, namely by improving the docking score calculated by the protein-compound docking program and by calculating the optimal distance between the feature vectors of active and inactive compounds. Both approaches require compounds that are known to be active with respect to the target protein. In structure-based screening, the former approach is mainly used with a protein-compound affinity matrix. In ligand-based screening, both the former and latter approaches are used, and the latter approach can be applied to various kinds of descriptors, such as 1D/2D descriptors/fingerprints and the affinity fingerprint given by the protein-compound affinity matrix.
药物研发的初始阶段是从数百万种化合物中寻找活性化合物;在此过程中,计算机虚拟筛选已得到成功应用。然而,计算机虚拟筛选存在的一个问题是,与高计算成本和长CPU时间相关的命中率较低。在基于结构的计算机虚拟筛选中,这个问题变得更加严重。主要原因是蛋白质-化合物结合自由能估计的准确性较低。基于配体的计算机虚拟筛选的问题在于,传统的定量构效关系(QSAR)方法在预测具有新骨架的新活性化合物方面效果不佳。最近,机器学习方法已被应用于计算机虚拟药物筛选,以克服上述问题。我们在此回顾基于结构和基于配体的药物筛选的机器学习方法。机器学习用于通过两种方式改善数据库富集,即通过改进蛋白质-化合物对接程序计算的对接分数,以及通过计算活性和非活性化合物特征向量之间的最佳距离。这两种方法都需要已知对目标蛋白有活性的化合物。在基于结构的筛选中,前一种方法主要与蛋白质-化合物亲和力矩阵一起使用。在基于配体的筛选中,两种方法都被使用,后一种方法可以应用于各种描述符,如1D/2D描述符/指纹以及由蛋白质-化合物亲和力矩阵给出的亲和力指纹。