Sidorov Pavel, Gaspar Helena, Marcou Gilles, Varnek Alexandre, Horvath Dragos
Laboratoire de Chémoinformatique, UMR 7140, CNRS-Univ. Strasbourg, 1 rue Blaise Pascal, 67000, Strasbourg, France.
Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia.
J Comput Aided Mol Des. 2015 Dec;29(12):1087-108. doi: 10.1007/s10822-015-9882-z. Epub 2015 Nov 12.
Intuitive, visual rendering--mapping--of high-dimensional chemical spaces (CS), is an important topic in chemoinformatics. Such maps were so far dedicated to specific compound collections--either limited series of known activities, or large, even exhaustive enumerations of molecules, but without associated property data. Typically, they were challenged to answer some classification problem with respect to those same molecules, admired for their aesthetical virtues and then forgotten--because they were set-specific constructs. This work wishes to address the question whether a general, compound set-independent map can be generated, and the claim of "universality" quantitatively justified, with respect to all the structure-activity information available so far--or, more realistically, an exploitable but significant fraction thereof. The "universal" CS map is expected to project molecules from the initial CS into a lower-dimensional space that is neighborhood behavior-compliant with respect to a large panel of ligand properties. Such map should be able to discriminate actives from inactives, or even support quantitative neighborhood-based, parameter-free property prediction (regression) models, for a wide panel of targets and target families. It should be polypharmacologically competent, without requiring any target-specific parameter fitting. This work describes an evolutionary growth procedure of such maps, based on generative topographic mapping, followed by the validation of their polypharmacological competence. Validation was achieved with respect to a maximum of exploitable structure-activity information, covering all of Homo sapiens proteins of the ChEMBL database, antiparasitic and antiviral data, etc. Five evolved maps satisfactorily solved hundreds of activity-based ligand classification challenges for targets, and even in vivo properties independent from training data. They also stood chemogenomics-related challenges, as cumulated responsibility vectors obtained by mapping of target-specific ligand collections were shown to represent validated target descriptors, complying with currently accepted target classification in biology. Therefore, they represent, in our opinion, a robust and well documented answer to the key question "What is a good CS map?"
高维化学空间(CS)的直观可视化呈现——映射,是化学信息学中的一个重要课题。迄今为止,此类图谱专门针对特定的化合物集合——要么是已知活性的有限系列,要么是分子的大量甚至详尽枚举,但没有相关的性质数据。通常,它们被用于回答关于这些相同分子的一些分类问题,因其美学优点而受到赞赏,然后被遗忘——因为它们是特定集合的构建物。这项工作希望解决是否可以生成一个通用的、与化合物集合无关的图谱这一问题,并就迄今为止所有可用的结构-活性信息,或者更现实地说,就其中可利用但重要的一部分信息,对“通用性”的主张进行定量论证。预期“通用”的CS图谱能将初始CS中的分子投影到一个低维空间,该空间在邻域行为方面符合大量配体性质。这样的图谱应该能够区分活性分子和非活性分子,甚至支持针对广泛的靶点和靶点家族的基于邻域的、无参数的定量性质预测(回归)模型。它应该具有多药理学能力,而无需任何靶点特异性参数拟合。这项工作描述了此类图谱的一种进化生长过程,该过程基于生成地形映射,随后对其多药理学能力进行验证。验证是针对最大可利用的结构-活性信息进行的,涵盖了ChEMBL数据库中的所有人类蛋白质、抗寄生虫和抗病毒数据等。五个进化图谱令人满意地解决了数百个基于活性的配体对靶点的分类挑战,甚至解决了与训练数据无关的体内性质问题。它们还经受住了与化学基因组学相关的挑战,因为通过映射靶点特异性配体集合获得的累积责任向量被证明代表经过验证的靶点描述符,符合生物学中目前公认的靶点分类。因此,在我们看来,它们代表了对关键问题“什么是好的CS图谱?”的一个有力且有充分记录的答案。