Department of Applied Chemistry, School of Science and Technology , Meiji University , 1-1-1 Higashi-Mita, Tama-ku , Kawasaki , Kanagawa 214-8571 , Japan.
J Chem Inf Model. 2018 Dec 24;58(12):2528-2535. doi: 10.1021/acs.jcim.8b00528. Epub 2018 Oct 31.
To achieve simultaneous data visualization and clustering, the method of sparse generative topographic mapping (SGTM) is developed by modifying the conventional GTM algorithm. While the weight of each grid point is constant in the original GTM, it becomes a variable in the proposed SGTM, enabling data points to be clustered on two-dimensional maps. The appropriate number of clusters is determined by optimization based on the Bayesian information criterion. Analysis of numerical simulation data sets along with quantitative structure-property relationship and quantitative structure-activity relationship data sets confirmed that the proposed SGTM provides the same degree of visualization performance as the original GTM and clusters data points appropriately. Python and MATLAB codes for the proposed algorithm are available at https://github.com/hkaneko1985/gtm-generativetopographicmapping .
为了实现数据的可视化和聚类的同步处理,通过修改传统的 GTM 算法,提出了稀疏生成式拓扑映射(SGTM)方法。在原始 GTM 中,每个网格点的权重是常数,而在建议的 SGTM 中,它成为了一个变量,从而可以在二维地图上对数据点进行聚类。通过基于贝叶斯信息准则的优化,确定适当的聚类数量。对数值模拟数据集以及定量构效关系和定量构性关系数据集的分析证实,所提出的 SGTM 提供了与原始 GTM 相同程度的可视化性能,并适当地对数据点进行聚类。用于该算法的 Python 和 MATLAB 代码可在 https://github.com/hkaneko1985/gtm-generativetopographicmapping 上获得。