Suppr超能文献

用于非线性化学空间可视化的欠采样技术。

Undersampling techniques for non-linear chemical space visualization.

作者信息

Surendran Akash, Zsigmond Krisztina, Miranda-Quintana Ramón Alain

机构信息

Quantum Theory Project and Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States.

出版信息

bioRxiv. 2025 Jul 7:2025.07.03.663077. doi: 10.1101/2025.07.03.663077.

Abstract

The visualization of high-dimensional chemical space is a critical tool for understanding molecular diversity, structure-property relationships, and for guiding compound selection. However, the performance of non-linear dimensionality reduction (DR) techniques like t-Stochastic Neighborhood Embedding (t-SNE), Uniform Manifold Approximation and Projection (UMAP), and Generative Topographic Mapping (GTM) are often susceptible to the choice of hyperparameters, along with the high cost of their training for large datasets. In this study, we investigated the effect of undersampling methods on the choice of hyperparameter selection for these non-linear dimensionality reduction methods. Our results demonstrate that selecting small representative subsets of chemical data not only reduces computational costs associated with hyperparameter training but also serves as an innovative means to train non-linear DR methods, leading to projections that better preserve the local structure within the chemical space.

摘要

高维化学空间的可视化是理解分子多样性、结构-性质关系以及指导化合物选择的关键工具。然而,诸如t-随机邻域嵌入(t-SNE)、均匀流形逼近与投影(UMAP)以及生成地形映射(GTM)等非线性降维(DR)技术的性能通常容易受到超参数选择的影响,同时对于大型数据集而言其训练成本高昂。在本研究中,我们研究了欠采样方法对这些非线性降维方法超参数选择的影响。我们的结果表明,选择化学数据的小代表性子集不仅降低了与超参数训练相关的计算成本,而且还作为一种创新方法来训练非线性DR方法,从而得到能更好地保留化学空间内局部结构的投影。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c55/12265540/d95705438e5b/nihpp-2025.07.03.663077v1-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验