• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于非线性化学空间可视化的欠采样技术。

Undersampling techniques for non-linear chemical space visualization.

作者信息

Surendran Akash, Zsigmond Krisztina, Miranda-Quintana Ramón Alain

机构信息

Quantum Theory Project and Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States.

出版信息

bioRxiv. 2025 Jul 7:2025.07.03.663077. doi: 10.1101/2025.07.03.663077.

DOI:10.1101/2025.07.03.663077
PMID:40672189
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12265540/
Abstract

The visualization of high-dimensional chemical space is a critical tool for understanding molecular diversity, structure-property relationships, and for guiding compound selection. However, the performance of non-linear dimensionality reduction (DR) techniques like t-Stochastic Neighborhood Embedding (t-SNE), Uniform Manifold Approximation and Projection (UMAP), and Generative Topographic Mapping (GTM) are often susceptible to the choice of hyperparameters, along with the high cost of their training for large datasets. In this study, we investigated the effect of undersampling methods on the choice of hyperparameter selection for these non-linear dimensionality reduction methods. Our results demonstrate that selecting small representative subsets of chemical data not only reduces computational costs associated with hyperparameter training but also serves as an innovative means to train non-linear DR methods, leading to projections that better preserve the local structure within the chemical space.

摘要

高维化学空间的可视化是理解分子多样性、结构-性质关系以及指导化合物选择的关键工具。然而,诸如t-随机邻域嵌入(t-SNE)、均匀流形逼近与投影(UMAP)以及生成地形映射(GTM)等非线性降维(DR)技术的性能通常容易受到超参数选择的影响,同时对于大型数据集而言其训练成本高昂。在本研究中,我们研究了欠采样方法对这些非线性降维方法超参数选择的影响。我们的结果表明,选择化学数据的小代表性子集不仅降低了与超参数训练相关的计算成本,而且还作为一种创新方法来训练非线性DR方法,从而得到能更好地保留化学空间内局部结构的投影。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c55/12265540/8327dd5eddd4/nihpp-2025.07.03.663077v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c55/12265540/d95705438e5b/nihpp-2025.07.03.663077v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c55/12265540/a21d715dcbe6/nihpp-2025.07.03.663077v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c55/12265540/3afc22a5743b/nihpp-2025.07.03.663077v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c55/12265540/8327dd5eddd4/nihpp-2025.07.03.663077v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c55/12265540/d95705438e5b/nihpp-2025.07.03.663077v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c55/12265540/a21d715dcbe6/nihpp-2025.07.03.663077v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c55/12265540/3afc22a5743b/nihpp-2025.07.03.663077v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c55/12265540/8327dd5eddd4/nihpp-2025.07.03.663077v1-f0004.jpg

相似文献

1
Undersampling techniques for non-linear chemical space visualization.用于非线性化学空间可视化的欠采样技术。
bioRxiv. 2025 Jul 7:2025.07.03.663077. doi: 10.1101/2025.07.03.663077.
2
Sexual Harassment and Prevention Training性骚扰与预防培训
3
Antidepressants for pain management in adults with chronic pain: a network meta-analysis.抗抑郁药治疗成人慢性疼痛的疼痛管理:一项网络荟萃分析。
Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.
4
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.
5
Omega-3 fatty acids for depression in adults.成人抑郁症的ω-3脂肪酸治疗
Cochrane Database Syst Rev. 2015 Nov 5;2015(11):CD004692. doi: 10.1002/14651858.CD004692.pub4.
6
Why Are Autistic People More Likely to Experience Suicidal Thoughts? Applying the Integrated Motivational-Volitional Model with Autistic Adults.为什么自闭症患者更容易产生自杀念头?将综合动机-意志模型应用于成年自闭症患者。
Autism Adulthood. 2024 Sep 16;6(3):272-283. doi: 10.1089/aut.2023.0039. eCollection 2024 Sep.
7
Factors that influence parents' and informal caregivers' views and practices regarding routine childhood vaccination: a qualitative evidence synthesis.影响父母和非正式照顾者对常规儿童疫苗接种看法和做法的因素:定性证据综合分析。
Cochrane Database Syst Rev. 2021 Oct 27;10(10):CD013265. doi: 10.1002/14651858.CD013265.pub2.
8
Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验:定性证据综合。
Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.
9
Consequences, costs and cost-effectiveness of workforce configurations in English acute hospitals.英国急症医院劳动力配置的后果、成本及成本效益
Health Soc Care Deliv Res. 2025 Jul;13(25):1-107. doi: 10.3310/ZBAR9152.
10
Digital interventions in mental health: evidence syntheses and economic modelling.数字干预在精神健康中的应用:证据综合和经济建模。
Health Technol Assess. 2022 Jan;26(1):1-182. doi: 10.3310/RCTI6942.

本文引用的文献

1
iSIM-Sigma: Efficient Standard Deviation Calculation for Molecular Similarity.iSIM-Sigma:用于分子相似性的高效标准差计算
J Chem Inf Model. 2025 Jul 14;65(13):6797-6808. doi: 10.1021/acs.jcim.5c00894. Epub 2025 Jun 17.
2
Growth vs Diversity: A Time-Evolution Analysis of the Chemical Space.增长与多样性:化学空间的时间演化分析
J Chem Inf Model. 2025 Jul 14;65(13):6788-6796. doi: 10.1021/acs.jcim.5c00347. Epub 2025 Jun 13.
3
iCliff Taylor's Version: Robust and Efficient Activity Cliff Determination.iCliff泰勒版本:稳健且高效的活性悬崖判定
J Chem Inf Model. 2025 Jun 9;65(11):5801-5810. doi: 10.1021/acs.jcim.5c00506. Epub 2025 May 21.
4
From High Dimensions to Human Insight: Exploring Dimensionality Reduction for Chemical Space Visualization.从高维到人类洞察:探索用于化学空间可视化的降维方法
Mol Inform. 2025 Jan;44(1):e202400265. doi: 10.1002/minf.202400265. Epub 2024 Dec 5.
5
Extended Activity Cliffs-Driven Approaches on Data Splitting for the Study of Bioactivity Machine Learning Predictions.用于生物活性机器学习预测研究的数据拆分的扩展活动悬崖驱动方法。
Mol Inform. 2025 Jan;44(1):e202400054. doi: 10.1002/minf.202400054. Epub 2024 Nov 18.
6
iSIM: instant similarity.iSIM:即时相似度。
Digit Discov. 2024 May 7;3(6):1160-1171. doi: 10.1039/d4dd00041b. eCollection 2024 Jun 12.
7
Sampling and Mapping Chemical Space with Extended Similarity Indices.使用扩展相似性指数进行化学空间的采样与映射
Molecules. 2023 Aug 30;28(17):6333. doi: 10.3390/molecules28176333.
8
Exploring activity landscapes with extended similarity: is Tanimoto enough?用扩展相似度探索活动景观:Tanimoto 足够吗?
Mol Inform. 2023 Jul;42(7):e2300056. doi: 10.1002/minf.202300056. Epub 2023 Jun 7.
9
Exposing the Limitations of Molecular Machine Learning with Activity Cliffs.利用活性悬崖揭示分子机器学习的局限性。
J Chem Inf Model. 2022 Dec 12;62(23):5938-5951. doi: 10.1021/acs.jcim.2c01073. Epub 2022 Dec 1.
10
Chemical Multiverse: An Expanded View of Chemical Space.化学多元宇宙:化学空间的扩展视角。
Mol Inform. 2022 Nov;41(11):e2200116. doi: 10.1002/minf.202200116. Epub 2022 Aug 23.