• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

解除高维数据的诅咒:针对多种生物数据模式的自动投影寻踪聚类

Lifting the curse from high-dimensional data: automated projection pursuit clustering for a variety of biological data modalities.

作者信息

Simpson Claire, Tabatsky Evgeniy, Rahil Zainab, Eddins Devon J, Tkachev Sasha, Georgescauld Florian, Papalegis Derek, Culka Martin, Levy Tyler, Gregoretti Ivan, Meehan Connor, Schiller Chiara, Bestak Kresimir, Schapiro Denis, Chernyshev Andrei, Walther Guenther, Ghosn Eliver E B, Orlova Darya

机构信息

Cell Signaling Technology, Danvers, MA 01915, USA.

Independent researcher, Komsomolsk-on-Amur 681021, Russia.

出版信息

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf052.

DOI:10.1093/gigascience/giaf052
PMID:40440093
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12121483/
Abstract

Unsupervised clustering is a powerful machine-learning technique widely used to analyze high-dimensional biological data. It plays a crucial role in uncovering patterns, structures, and inherent relationships within complex datasets without relying on predefined labels. In the context of biology, high-dimensional data may include transcriptomics, proteomics, and a variety of single-cell omics data. Most existing clustering algorithms operate directly in the high-dimensional space, and their performance may be negatively affected by the phenomenon known as the curse of dimensionality. Here, we show an alternative clustering approach that alleviates the curse by sequentially projecting high-dimensional data into a low-dimensional representation. We validated the effectiveness of our approach, named automated projection pursuit (APP), across various biological data modalities, including flow and mass cytometry data, scRNA-seq, multiplex imaging data, and T-cell receptor repertoire data. APP efficiently recapitulated experimentally validated cell-type definitions and revealed new biologically meaningful patterns.

摘要

无监督聚类是一种强大的机器学习技术,广泛用于分析高维生物学数据。它在揭示复杂数据集中的模式、结构和内在关系方面发挥着关键作用,而无需依赖预定义的标签。在生物学背景下,高维数据可能包括转录组学、蛋白质组学以及各种单细胞组学数据。大多数现有的聚类算法直接在高维空间中运行,其性能可能会受到所谓的维度诅咒现象的负面影响。在这里,我们展示了一种替代的聚类方法,该方法通过将高维数据顺序投影到低维表示中来减轻维度诅咒。我们在各种生物学数据模式中验证了我们称为自动投影追踪(APP)的方法的有效性,包括流式细胞术和质谱细胞术数据、单细胞RNA测序、多重成像数据以及T细胞受体库数据。APP有效地概括了经过实验验证的细胞类型定义,并揭示了新的具有生物学意义的模式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/12121483/4be9f656d38f/giaf052fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/12121483/3d3879a23bb0/giaf052fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/12121483/055108e9bbbe/giaf052fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/12121483/1ed3dce6cd6e/giaf052fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/12121483/2f58e44facde/giaf052fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/12121483/c87fb6be15f0/giaf052fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/12121483/4be9f656d38f/giaf052fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/12121483/3d3879a23bb0/giaf052fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/12121483/055108e9bbbe/giaf052fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/12121483/1ed3dce6cd6e/giaf052fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/12121483/2f58e44facde/giaf052fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/12121483/c87fb6be15f0/giaf052fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d05/12121483/4be9f656d38f/giaf052fig6.jpg

相似文献

1
Lifting the curse from high-dimensional data: automated projection pursuit clustering for a variety of biological data modalities.解除高维数据的诅咒:针对多种生物数据模式的自动投影寻踪聚类
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf052.
2
scZAG: Integrating ZINB-Based Autoencoder with Adaptive Data Augmentation Graph Contrastive Learning for scRNA-seq Clustering.scZAG:基于 ZINB 的自动编码器与自适应数据增强图对比学习在 scRNA-seq 聚类中的整合。
Int J Mol Sci. 2024 May 29;25(11):5976. doi: 10.3390/ijms25115976.
3
Graph contrastive learning as a versatile foundation for advanced scRNA-seq data analysis.图对比学习作为高级 scRNA-seq 数据分析的多功能基础。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae558.
4
Multi-level multi-view network based on structural contrastive learning for scRNA-seq data clustering.基于结构对比学习的多层次多视图网络用于 scRNA-seq 数据聚类。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae562.
5
scMNMF: a novel method for single-cell multi-omics clustering based on matrix factorization.scMNMF:一种基于矩阵分解的单细胞多组学聚类新方法。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae228.
6
scMUSCL: multi-source transfer learning for clustering scRNA-seq data.scMUSCL:用于单细胞RNA测序数据聚类的多源迁移学习
Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf137.
7
diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering.diffcyt:通过高分辨率聚类进行高维流式细胞术的差异发现。
Commun Biol. 2019 May 14;2:183. doi: 10.1038/s42003-019-0415-5. eCollection 2019.
8
A hybrid deep clustering approach for robust cell type profiling using single-cell RNA-seq data.基于单细胞 RNA-seq 数据的混合深度聚类方法进行稳健的细胞类型分析。
RNA. 2020 Oct;26(10):1303-1319. doi: 10.1261/rna.074427.119. Epub 2020 Jun 12.
9
Decoupled GNNs based on multi-view contrastive learning for scRNA-seq data clustering.基于多视图对比学习的解耦图神经网络用于单细胞RNA测序数据聚类
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf198.
10
nsDCC: dual-level contrastive clustering with nonuniform sampling for scRNA-seq data analysis.nsDCC:基于非均匀采样的双层对比聚类算法,用于 scRNA-seq 数据分析。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae477.

引用本文的文献

1
On the dilemma of using single EV analysis for liquid biopsy: the challenge of low abundance of tumor EVs in blood.关于在液体活检中使用单一细胞外囊泡分析的困境:血液中肿瘤细胞外囊泡丰度低的挑战。
Theranostics. 2025 Jul 24;15(16):8031-8048. doi: 10.7150/thno.115131. eCollection 2025.
2
Automatic phenotyping using exhaustive projection pursuit.使用穷举投影寻踪的自动表型分析
Commun Biol. 2025 Aug 12;8(1):1207. doi: 10.1038/s42003-025-08581-z.
3
Automatic Phenotyping Using Exhaustive Projection Pursuit.使用穷举投影寻优法进行自动表型分析

本文引用的文献

1
Designing proteins with language models.利用语言模型设计蛋白质。
Nat Biotechnol. 2024 Feb;42(2):200-202. doi: 10.1038/s41587-024-02123-4.
2
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
3
Large language models generate functional protein sequences across diverse families.大型语言模型可生成不同家族的功能性蛋白质序列。
bioRxiv. 2025 Jun 6:2024.11.20.624581. doi: 10.1101/2024.11.20.624581.
Nat Biotechnol. 2023 Aug;41(8):1099-1106. doi: 10.1038/s41587-022-01618-2. Epub 2023 Jan 26.
4
Transformer-based deep learning for predicting protein properties in the life sciences.基于 Transformer 的深度学习在生命科学中预测蛋白质性质。
Elife. 2023 Jan 18;12:e82819. doi: 10.7554/eLife.82819.
5
Transcriptional reprogramming of infiltrating neutrophils drives lung pathology in severe COVID-19 despite low viral load.转录重编程的浸润中性粒细胞驱动严重 COVID-19 的肺部病变,尽管病毒载量低。
Blood Adv. 2023 Mar 14;7(5):778-799. doi: 10.1182/bloodadvances.2022008834.
6
Stitching and registering highly multiplexed whole-slide images of tissues and tumors using ASHLAR.使用 ASHLAR 对组织和肿瘤的高多重化全幻灯片图像进行拼接和配准。
Bioinformatics. 2022 Sep 30;38(19):4613-4621. doi: 10.1093/bioinformatics/btac544.
7
Deciphering the language of antibodies using self-supervised learning.利用自监督学习破解抗体语言。
Patterns (N Y). 2022 May 18;3(7):100513. doi: 10.1016/j.patter.2022.100513. eCollection 2022 Jul 8.
8
Tumor-associated macrophage heterogeneity is driven by tissue territories in breast cancer.肿瘤相关巨噬细胞的异质性由乳腺癌中的组织区域所驱动。
Cell Rep. 2022 May 24;39(8):110865. doi: 10.1016/j.celrep.2022.110865.
9
MCMICRO: a scalable, modular image-processing pipeline for multiplexed tissue imaging.MCMICRO:一种用于多重组织成像的可扩展、模块化图像处理流水线。
Nat Methods. 2022 Mar;19(3):311-315. doi: 10.1038/s41592-021-01308-y. Epub 2021 Nov 25.
10
Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning.使用大规模数据标注和深度学习实现具有人类水平性能的组织图像全细胞分割。
Nat Biotechnol. 2022 Apr;40(4):555-565. doi: 10.1038/s41587-021-01094-0. Epub 2021 Nov 18.