Gaspar Héléna A, Marcou Gilles, Horvath Dragos, Arault Alban, Lozano Sylvain, Vayer Philippe, Varnek Alexandre
Faculté de Chimie, Université de Strasbourg, UMR 7140-Laboratoire de Chémoinformatique , 1 rue Blaise Pascal, 67000 Strasbourg, France.
J Chem Inf Model. 2013 Dec 23;53(12):3318-25. doi: 10.1021/ci400423c. Epub 2013 Dec 9.
Earlier (Kireeva et al. Mol. Inf. 2012, 31, 301-312), we demonstrated that generative topographic mapping (GTM) can be efficiently used both for data visualization and building of classification models in the initial D-dimensional space of molecular descriptors. Here, we describe the modeling in two-dimensional latent space for the four classes of the BioPharmaceutics Drug Disposition Classification System (BDDCS) involving VolSurf descriptors. Three new definitions of the applicability domain (AD) of models have been suggested: one class-independent AD which considers the GTM likelihood and two class-dependent ADs considering respectively, either the predominant class in a given node of the map or informational entropy. The class entropy AD was found to be the most efficient for the BDDCS modeling. The predominant class AD can be directly visualized on GTM maps, which helps the interpretation of the model.
此前(Kireeva等人,《分子信息学》,2012年,31卷,301 - 312页),我们证明了生成地形映射(GTM)可有效地用于分子描述符初始D维空间中的数据可视化和分类模型构建。在此,我们描述了涉及VolSurf描述符的生物药剂学药物处置分类系统(BDDCS)四类在二维潜在空间中的建模。提出了模型适用性域(AD)的三种新定义:一种与类别无关的AD,其考虑GTM似然性;两种与类别相关的AD,分别考虑映射给定节点中的主要类别或信息熵。发现类别熵AD在BDDCS建模中效率最高。主要类别AD可直接在GTM图上可视化,这有助于模型解释。