Department of Applied Chemistry, Meiji University 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa, 214-8571, Japan.
Mol Inform. 2019 Mar;38(3):e1800088. doi: 10.1002/minf.201800088. Epub 2018 Sep 27.
This paper introduces two generative topographic mapping (GTM) methods that can be used for data visualization, regression analysis, inverse analysis, and the determination of applicability domains (ADs). In GTM-multiple linear regression (GTM-MLR), the prior probability distribution of the descriptors or explanatory variables (X) is calculated with GTM, and the posterior probability distribution of the property/activity or objective variable (y) given X is calculated with MLR; inverse analysis is then performed using the product rule and Bayes' theorem. In GTM-regression (GTMR), X and y are combined and GTM is performed to obtain the joint probability distribution of X and y; this leads to the posterior probability distributions of y given X and of X given y, which are used for regression and inverse analysis, respectively. Simulations using linear and nonlinear datasets and quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) datasets confirm that GTM-MLR and GTMR enable data visualization, regression analysis, and inverse analysis considering appropriate ADs. Python and MATLAB codes for the proposed algorithms are available at https://github.com/hkaneko1985/gtm-generativetopographicmapping.
本文介绍了两种生成拓扑映射(GTM)方法,可用于数据可视化、回归分析、反分析和适用性域(AD)的确定。在 GTM-多元线性回归(GTM-MLR)中,用 GTM 计算描述符或解释变量(X)的先验概率分布,用 MLR 计算给定 X 的属性/活性或目标变量(y)的后验概率分布;然后使用乘积规则和贝叶斯定理进行反分析。在 GTM-回归(GTMR)中,将 X 和 y 组合起来,并进行 GTM 以获得 X 和 y 的联合概率分布;这导致了 y 给定 X 的后验概率分布和 X 给定 y 的后验概率分布,分别用于回归和反分析。使用线性和非线性数据集以及定量构效关系(QSAR)和定量构性关系(QSPR)数据集进行的模拟证实,GTM-MLR 和 GTMR 能够考虑适当的 AD 进行数据可视化、回归分析和反分析。拟议算法的 Python 和 MATLAB 代码可在 https://github.com/hkaneko1985/gtm-generativetopographicmapping 上获得。