Jacob Laurent, Hoffmann Brice, Stoven Véronique, Vert Jean-Philippe
Mines ParisTech, Centre for Computational Biology, 35 rue Saint-Honoré, F-77305, Fontainebleau, France.
BMC Bioinformatics. 2008 Sep 6;9:363. doi: 10.1186/1471-2105-9-363.
The G-protein coupled receptor (GPCR) superfamily is currently the largest class of therapeutic targets. In silico prediction of interactions between GPCRs and small molecules in the transmembrane ligand-binding site is therefore a crucial step in the drug discovery process, which remains a daunting task due to the difficulty to characterize the 3D structure of most GPCRs, and to the limited amount of known ligands for some members of the superfamily. Chemogenomics, which attempts to characterize interactions between all members of a target class and all small molecules simultaneously, has recently been proposed as an interesting alternative to traditional docking or ligand-based virtual screening strategies.
We show that interaction prediction in the chemogenomics framework outperforms state-of-the-art individual ligand-based methods in accuracy both for receptor with known ligands and without known ligands. This is done with no knowledge of the receptor 3D structure. In particular we are able to predict ligands of orphan GPCRs with an estimated accuracy of 78.1%.
We propose new methods for in silico chemogenomics and validate them on the virtual screening of GPCRs. The methods represent an extension of a recently proposed machine learning strategy, based on support vector machines (SVM), which provides a flexible framework to incorporate various information sources on the biological space of targets and on the chemical space of small molecules. We investigate the use of 2D and 3D descriptors for small molecules, and test a variety of descriptors for GPCRs. We show that incorporating information about the known hierarchical classification of the target family and about key residues in their inferred binding pockets significantly improves the prediction accuracy of our model.
G蛋白偶联受体(GPCR)超家族目前是最大的一类治疗靶点。因此,在跨膜配体结合位点对GPCR与小分子之间的相互作用进行计算机预测是药物发现过程中的关键一步,由于难以确定大多数GPCR的三维结构,且该超家族某些成员的已知配体数量有限,这仍然是一项艰巨的任务。化学基因组学试图同时表征一个靶点类别的所有成员与所有小分子之间的相互作用,最近已被提出作为传统对接或基于配体的虚拟筛选策略的一种有趣替代方法。
我们表明,在化学基因组学框架下的相互作用预测在准确性方面优于最先进的基于单个配体的方法,无论是对于有已知配体的受体还是没有已知配体的受体。这一过程无需了解受体的三维结构。特别是,我们能够以78.1%的估计准确率预测孤儿GPCR的配体。
我们提出了用于计算机化学基因组学的新方法,并在GPCR的虚拟筛选中对其进行了验证。这些方法是最近提出的基于支持向量机(SVM)的机器学习策略的扩展,该策略提供了一个灵活的框架,可纳入关于靶点生物空间和小分子化学空间的各种信息源。我们研究了小分子的二维和三维描述符的使用,并测试了多种GPCR描述符。我们表明,纳入有关目标家族已知层次分类的信息以及其推断结合口袋中的关键残基信息,可显著提高我们模型的预测准确率。