• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

单细胞基因组学的似是而非的艺术。

The specious art of single-cell genomics.

机构信息

Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America.

Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California, United States of America.

出版信息

PLoS Comput Biol. 2023 Aug 17;19(8):e1011288. doi: 10.1371/journal.pcbi.1011288. eCollection 2023 Aug.

DOI:10.1371/journal.pcbi.1011288
PMID:37590228
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10434946/
Abstract

Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with reduction to 2 or 3 dimensions to produce "all-in-one" visuals of the data that are amenable to the human eye, and these are subsequently used for qualitative and quantitative exploratory analysis. However, there is little theoretical support for this practice, and we show that extreme dimension reduction, from hundreds or thousands of dimensions to 2, inevitably induces significant distortion of high-dimensional datasets. We therefore examine the practical implications of low-dimensional embedding of single-cell data and find that extensive distortions and inconsistent practices make such embeddings counter-productive for exploratory, biological analyses. In lieu of this, we discuss alternative approaches for conducting targeted embedding and feature exploration to enable hypothesis-driven biological discovery.

摘要

降维是过滤噪声和识别大规模数据分析中相关特征的标准做法。在生物学中,单细胞基因组学研究通常首先进行到 2 或 3 维,以生成数据的“一站式”可视化,这些可视化适合人眼观察,随后可用于定性和定量探索性分析。然而,这种做法几乎没有理论支持,我们表明,从数百或数千个维度到 2 个维度的极端降维不可避免地会对高维数据集造成严重的扭曲。因此,我们研究了单细胞数据低维嵌入的实际影响,发现广泛的扭曲和不一致的做法使得这种嵌入对探索性、生物学分析适得其反。为此,我们讨论了用于进行有针对性的嵌入和特征探索的替代方法,以支持基于假设的生物发现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72e3/10434946/e1ba33cc44f3/pcbi.1011288.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72e3/10434946/d9a36c6b5e93/pcbi.1011288.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72e3/10434946/1f78a2aa63db/pcbi.1011288.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72e3/10434946/259e1c67c9cd/pcbi.1011288.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72e3/10434946/a6f362735f6b/pcbi.1011288.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72e3/10434946/98bef1890dbd/pcbi.1011288.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72e3/10434946/9e5b8f60a80b/pcbi.1011288.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72e3/10434946/951fc64fb5ce/pcbi.1011288.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72e3/10434946/e1ba33cc44f3/pcbi.1011288.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72e3/10434946/d9a36c6b5e93/pcbi.1011288.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72e3/10434946/1f78a2aa63db/pcbi.1011288.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72e3/10434946/259e1c67c9cd/pcbi.1011288.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72e3/10434946/a6f362735f6b/pcbi.1011288.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72e3/10434946/98bef1890dbd/pcbi.1011288.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72e3/10434946/9e5b8f60a80b/pcbi.1011288.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72e3/10434946/951fc64fb5ce/pcbi.1011288.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72e3/10434946/e1ba33cc44f3/pcbi.1011288.g008.jpg

相似文献

1
The specious art of single-cell genomics.单细胞基因组学的似是而非的艺术。
PLoS Comput Biol. 2023 Aug 17;19(8):e1011288. doi: 10.1371/journal.pcbi.1011288. eCollection 2023 Aug.
2
The art of seeing the elephant in the room: 2D embeddings of single-cell data do make sense.洞察房间里的大象的艺术:单细胞数据的二维嵌入确实有意义。
PLoS Comput Biol. 2024 Oct 2;20(10):e1012403. doi: 10.1371/journal.pcbi.1012403. eCollection 2024 Oct.
3
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
4
scLENS: data-driven signal detection for unbiased scRNA-seq data analysis.scLENS:用于无偏单细胞RNA测序数据分析的数据驱动信号检测
Nat Commun. 2024 Apr 27;15(1):3575. doi: 10.1038/s41467-024-47884-3.
5
Low-dimensional representation of genomic sequences.基因组序列的低维表示
J Math Biol. 2019 Jul;79(1):1-29. doi: 10.1007/s00285-019-01348-1. Epub 2019 Mar 30.
6
Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation.麻叶千里光:通过将数百万个基因组嵌入到低维表示中,可视化微生物种群结构。
Philos Trans R Soc Lond B Biol Sci. 2022 Oct 10;377(1861):20210237. doi: 10.1098/rstb.2021.0237. Epub 2022 Aug 22.
7
Dimension reduction techniques for the integrative analysis of multi-omics data.用于多组学数据综合分析的降维技术
Brief Bioinform. 2016 Jul;17(4):628-41. doi: 10.1093/bib/bbv108. Epub 2016 Mar 11.
8
Towards human-computer synergetic analysis of large-scale biological data.迈向大规模生物数据的人机协同分析。
BMC Bioinformatics. 2013;14 Suppl 14(Suppl 14):S10. doi: 10.1186/1471-2105-14-S14-S10. Epub 2013 Oct 9.
9
10
scBFA: modeling detection patterns to mitigate technical noise in large-scale single-cell genomics data.scBFA:建模检测模式以减轻大规模单细胞基因组学数据中的技术噪声。
Genome Biol. 2019 Sep 9;20(1):193. doi: 10.1186/s13059-019-1806-0.

引用本文的文献

1
Defining breast epithelial cell types in the single-cell era.单细胞时代乳腺上皮细胞类型的定义
Dev Cell. 2025 Sep 8;60(17):2218-2236. doi: 10.1016/j.devcel.2025.06.032.
2
Brain stimulation preferentially influences long-range projections.脑部刺激优先影响远程投射。
Sci Adv. 2025 Sep 5;11(36):eadx2106. doi: 10.1126/sciadv.adx2106.
3
DeepAtlas: a tool for effective manifold learning.深度图谱:一种用于有效流形学习的工具。

本文引用的文献

1
Joint trajectory inference for single-cell genomics using deep learning with a mixture prior.基于混合先验的深度学习方法进行单细胞基因组学的联合轨迹推断。
Proc Natl Acad Sci U S A. 2024 Sep 10;121(37):e2316256121. doi: 10.1073/pnas.2316256121. Epub 2024 Sep 3.
2
Pumping the brakes on RNA velocity by understanding and interpreting RNA velocity estimates.通过理解和解释 RNA 速度估计来减缓 RNA 速度。
Genome Biol. 2023 Oct 26;24(1):246. doi: 10.1186/s13059-023-03065-x.
3
Comparison of transformations for single-cell RNA-seq data.
bioRxiv. 2025 Aug 31:2025.08.26.672474. doi: 10.1101/2025.08.26.672474.
4
DeepAtlas: a tool for effective manifold learning.深度图谱:一种用于有效流形学习的工具。
ArXiv. 2025 Aug 26:arXiv:2508.19479v1.
5
Establishing single cell RNA transcriptomics: a brief guide.建立单细胞RNA转录组学:简要指南。
Front Zool. 2025 Sep 2;22(1):25. doi: 10.1186/s12983-025-00579-x.
6
Model-based dimensionality reduction for single-cell RNA-seq using generalized bilinear models.使用广义双线性模型对单细胞RNA测序进行基于模型的降维
Biostatistics. 2024 Dec 31;26(1). doi: 10.1093/biostatistics/kxaf024.
7
High performance data integration for large-scale analyses of incomplete Omic profiles using Batch-Effect Reduction Trees (BERT).使用批效应减少树(BERT)对不完整组学图谱进行大规模分析的高性能数据集成。
Nat Commun. 2025 Aug 2;16(1):7104. doi: 10.1038/s41467-025-62237-4.
8
2OMe-LM: predicting 2'-O-methylation sites in human RNA using a pre-trained RNA language model.2OMe-LM:使用预训练的RNA语言模型预测人类RNA中的2'-O-甲基化位点
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf417.
9
A variational deep-learning approach to modeling memory T cell dynamics.一种用于模拟记忆性T细胞动力学的变分深度学习方法。
PLoS Comput Biol. 2025 Jul 24;21(7):e1013242. doi: 10.1371/journal.pcbi.1013242.
10
Joint representation and visualization of derailed cell states with Decipher.使用Decipher对脱轨细胞状态进行联合表示和可视化。
Genome Biol. 2025 Jul 23;26(1):219. doi: 10.1186/s13059-025-03682-8.
单细胞 RNA-seq 数据转换方法比较。
Nat Methods. 2023 May;20(5):665-672. doi: 10.1038/s41592-023-01814-1. Epub 2023 Apr 10.
4
Dissecting cell identity via network inference and in silico gene perturbation.通过网络推断和计算机基因扰动解析细胞身份。
Nature. 2023 Feb;614(7949):742-751. doi: 10.1038/s41586-022-05688-9. Epub 2023 Feb 8.
5
Interpretable and tractable models of transcriptional noise for the rational design of single-molecule quantification experiments.可解释且易于处理的转录噪声模型,用于合理设计单分子定量实验。
Nat Commun. 2022 Dec 9;13(1):7620. doi: 10.1038/s41467-022-34857-7.
6
How does the structure of data impact cell-cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data.数据结构如何影响细胞间的相似性?评估结构属性如何影响单细胞 RNA-seq 数据中邻近度量的性能。
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac387.
7
RNA velocity unraveled.RNA 速度解析。
PLoS Comput Biol. 2022 Sep 12;18(9):e1010492. doi: 10.1371/journal.pcbi.1010492. eCollection 2022 Sep.
8
Bi-order multimodal integration of single-cell data.单细胞数据的双阶多模态整合。
Genome Biol. 2022 May 9;23(1):112. doi: 10.1186/s13059-022-02679-x.
9
Data-driven assessment of dimension reduction quality for single-cell omics data.单细胞组学数据降维质量的数据驱动评估
Patterns (N Y). 2022 Mar 11;3(3):100465. doi: 10.1016/j.patter.2022.100465.
10
Massively parallel phenotyping of coding variants in cancer with Perturb-seq.利用 Perturb-seq 对癌症中的编码变异进行大规模平行表型分析。
Nat Biotechnol. 2022 Jun;40(6):896-905. doi: 10.1038/s41587-021-01160-7. Epub 2022 Jan 20.