• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在存在噪声的情况下校准降维超参数。

Calibrating dimension reduction hyperparameters in the presence of noise.

机构信息

Department of Mathematics, Indiana University, Bloomington, Indiana, United States of America.

Department of Statistics, Indiana University, Bloomington, Indiana, United States of America.

出版信息

PLoS Comput Biol. 2024 Sep 12;20(9):e1012427. doi: 10.1371/journal.pcbi.1012427. eCollection 2024 Sep.

DOI:10.1371/journal.pcbi.1012427
PMID:39264943
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11421778/
Abstract

The goal of dimension reduction tools is to construct a low-dimensional representation of high-dimensional data. These tools are employed for a variety of reasons such as noise reduction, visualization, and to lower computational costs. However, there is a fundamental issue that is discussed in other modeling problems that is often overlooked in dimension reduction-overfitting. In the context of other modeling problems, techniques such as feature-selection, cross-validation, and regularization are employed to combat overfitting, but rarely are such precautions taken when applying dimension reduction. Prior applications of the two most popular non-linear dimension reduction methods, t-SNE and UMAP, fail to acknowledge data as a combination of signal and noise when assessing performance. These methods are typically calibrated to capture the entirety of the data, not just the signal. In this paper, we demonstrate the importance of acknowledging noise when calibrating hyperparameters and present a framework that enables users to do so. We use this framework to explore the role hyperparameter calibration plays in overfitting the data when applying t-SNE and UMAP. More specifically, we show previously recommended values for perplexity and n_neighbors are too small and overfit the noise. We also provide a workflow others may use to calibrate hyperparameters in the presence of noise.

摘要

降维工具的目标是构建高维数据的低维表示。这些工具被用于各种原因,如降噪、可视化和降低计算成本。然而,在降维中,有一个在其他建模问题中讨论过但经常被忽视的基本问题,即过拟合。在其他建模问题中,会采用特征选择、交叉验证和正则化等技术来对抗过拟合,但在应用降维时很少采取这些预防措施。两种最流行的非线性降维方法 t-SNE 和 UMAP 的先前应用在评估性能时未能将数据视为信号和噪声的组合。这些方法通常经过校准以捕捉数据的全部内容,而不仅仅是信号。在本文中,我们展示了在调整超参数时承认噪声的重要性,并提出了一个框架,使用户能够做到这一点。我们使用这个框架来探索在应用 t-SNE 和 UMAP 时,超参数校准在数据过拟合中所扮演的角色。具体来说,我们表明先前推荐的困惑度和 n_neighbors 值太小,会过拟合噪声。我们还提供了一个工作流程,其他人可以在存在噪声的情况下使用该流程来校准超参数。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/0c2f7629186b/pcbi.1012427.g020.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/fee9baddf04a/pcbi.1012427.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/c9af3b94b6c7/pcbi.1012427.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/fd5bc07b7659/pcbi.1012427.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/49230a622214/pcbi.1012427.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/fd242d18295a/pcbi.1012427.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/650b71560361/pcbi.1012427.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/7e47bd478973/pcbi.1012427.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/83cc0a9a9ce0/pcbi.1012427.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/f7b4bd99149a/pcbi.1012427.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/2f78f9fe170e/pcbi.1012427.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/7601fdbde65d/pcbi.1012427.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/3c2baac85b08/pcbi.1012427.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/1bbf4b7a3afd/pcbi.1012427.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/172ac2355670/pcbi.1012427.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/680aff86fc03/pcbi.1012427.g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/b89fffab4920/pcbi.1012427.g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/8474cd6c433e/pcbi.1012427.g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/6e9f995bc1dc/pcbi.1012427.g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/b391ec243f6d/pcbi.1012427.g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/0c2f7629186b/pcbi.1012427.g020.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/fee9baddf04a/pcbi.1012427.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/c9af3b94b6c7/pcbi.1012427.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/fd5bc07b7659/pcbi.1012427.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/49230a622214/pcbi.1012427.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/fd242d18295a/pcbi.1012427.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/650b71560361/pcbi.1012427.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/7e47bd478973/pcbi.1012427.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/83cc0a9a9ce0/pcbi.1012427.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/f7b4bd99149a/pcbi.1012427.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/2f78f9fe170e/pcbi.1012427.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/7601fdbde65d/pcbi.1012427.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/3c2baac85b08/pcbi.1012427.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/1bbf4b7a3afd/pcbi.1012427.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/172ac2355670/pcbi.1012427.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/680aff86fc03/pcbi.1012427.g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/b89fffab4920/pcbi.1012427.g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/8474cd6c433e/pcbi.1012427.g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/6e9f995bc1dc/pcbi.1012427.g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/b391ec243f6d/pcbi.1012427.g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e85a/11421778/0c2f7629186b/pcbi.1012427.g020.jpg

相似文献

1
Calibrating dimension reduction hyperparameters in the presence of noise.在存在噪声的情况下校准降维超参数。
PLoS Comput Biol. 2024 Sep 12;20(9):e1012427. doi: 10.1371/journal.pcbi.1012427. eCollection 2024 Sep.
2
PARE: A framework for removal of confounding effects from any distance-based dimension reduction method.PARE:一种从任何基于距离的降维方法中去除混杂效应的框架。
PLoS Comput Biol. 2024 Jul 10;20(7):e1012241. doi: 10.1371/journal.pcbi.1012241. eCollection 2024 Jul.
3
A generalization of t-SNE and UMAP to single-cell multimodal omics.单细胞多模态组学中 t-SNE 和 UMAP 的推广
Genome Biol. 2021 May 3;22(1):130. doi: 10.1186/s13059-021-02356-5.
4
Exploring nonlinear feature space dimension reduction and data representation in breast Cadx with Laplacian eigenmaps and t-SNE.探讨基于拉普拉斯特征映射和 t-SNE 的乳腺 CADx 非线性特征空间降维和数据表示。
Med Phys. 2010 Jan;37(1):339-51. doi: 10.1118/1.3267037.
5
Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters.用于检测可疑的 2D 单细胞嵌入并优化 t-SNE 和 UMAP 参数的统计方法 scDEED。
Nat Commun. 2024 Feb 26;15(1):1753. doi: 10.1038/s41467-024-45891-y.
6
DGCyTOF: Deep learning with graphic cluster visualization to predict cell types of single cell mass cytometry data.DGCyTOF:基于图形聚类可视化的深度学习,用于预测单细胞质谱流式细胞术数据的细胞类型。
PLoS Comput Biol. 2022 Apr 11;18(4):e1008885. doi: 10.1371/journal.pcbi.1008885. eCollection 2022 Apr.
7
Ant Colony-Based Hyperparameter Optimisation in Total Variation Reconstruction in X-ray Computed Tomography.基于蚁群算法的 X 射线计算机断层扫描全变差重建中的超参数优化。
Sensors (Basel). 2021 Jan 15;21(2):591. doi: 10.3390/s21020591.
8
Towards a comprehensive evaluation of dimension reduction methods for transcriptomic data visualization.面向转录组数据可视化的降维方法综合评估。
Commun Biol. 2022 Jul 19;5(1):719. doi: 10.1038/s42003-022-03628-x.
9
Application of t-SNE to human genetic data.t-SNE在人类遗传数据中的应用。
J Bioinform Comput Biol. 2017 Aug;15(4):1750017. doi: 10.1142/S0219720017500172. Epub 2017 Jun 23.
10
Impact of calibrating a low-cost capacitance-based soil moisture sensor on AquaCrop model performance.校准低成本电容式土壤水分传感器对 AquaCrop 模型性能的影响。
J Environ Manage. 2024 Feb 27;353:120248. doi: 10.1016/j.jenvman.2024.120248. Epub 2024 Feb 6.

本文引用的文献

1
The specious art of single-cell genomics.单细胞基因组学的似是而非的艺术。
PLoS Comput Biol. 2023 Aug 17;19(8):e1011288. doi: 10.1371/journal.pcbi.1011288. eCollection 2023 Aug.
2
Towards a comprehensive evaluation of dimension reduction methods for transcriptomic data visualization.面向转录组数据可视化的降维方法综合评估。
Commun Biol. 2022 Jul 19;5(1):719. doi: 10.1038/s42003-022-03628-x.
3
Comprehensive evaluation of noise reduction methods for single-cell RNA sequencing data.单细胞 RNA 测序数据降噪方法的综合评估。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab565.
4
Initialization is critical for preserving global data structure in both t-SNE and UMAP.初始化对于在t-SNE和UMAP中保存全局数据结构至关重要。
Nat Biotechnol. 2021 Feb;39(2):156-157. doi: 10.1038/s41587-020-00809-z. Epub 2021 Feb 1.
5
Visualizing structure and transitions in high-dimensional biological data.高维生物数据中的结构和转变可视化。
Nat Biotechnol. 2019 Dec;37(12):1482-1492. doi: 10.1038/s41587-019-0336-3. Epub 2019 Dec 3.
6
The art of using t-SNE for single-cell transcriptomics.使用 t-SNE 进行单细胞转录组学分析的艺术。
Nat Commun. 2019 Nov 28;10(1):5416. doi: 10.1038/s41467-019-13056-x.
7
Toward a Quantitative Survey of Dimension Reduction Techniques.迈向降维技术的定量调查。
IEEE Trans Vis Comput Graph. 2021 Mar;27(3):2153-2173. doi: 10.1109/TVCG.2019.2944182. Epub 2021 Jan 28.
8
Dimensionality reduction for visualizing single-cell data using UMAP.使用UMAP进行单细胞数据可视化的降维方法。
Nat Biotechnol. 2018 Dec 3. doi: 10.1038/nbt.4314.
9
Batch effects and the effective design of single-cell gene expression studies.批次效应与单细胞基因表达研究的有效设计。
Sci Rep. 2017 Jan 3;7:39921. doi: 10.1038/srep39921.
10
Human NK cell repertoire diversity reflects immune experience and correlates with viral susceptibility.人类自然杀伤细胞库的多样性反映了免疫经历,并与病毒易感性相关。
Sci Transl Med. 2015 Jul 22;7(297):297ra115. doi: 10.1126/scitranslmed.aac5722.