• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

单细胞 RNA 测序数据分析中维度诅咒的解决。

Resolution of the curse of dimensionality in single-cell RNA sequencing data analysis.

机构信息

Institute for the Advanced Study of Human Biology, Kyoto University Institute for Advanced Study, Kyoto University, Kyoto, Japan.

Department of Anatomy and Cell Biology, Graduate School of Medicine, Kyoto University, Kyoto, Japan.

出版信息

Life Sci Alliance. 2022 Aug 9;5(12):e202201591. doi: 10.26508/lsa.202201591.

DOI:10.26508/lsa.202201591
PMID:35944930
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9363502/
Abstract

Single-cell RNA sequencing (scRNA-seq) can determine gene expression in numerous individual cells simultaneously, promoting progress in the biomedical sciences. However, scRNA-seq data are high-dimensional with substantial technical noise, including dropouts. During analysis of scRNA-seq data, such noise engenders a statistical problem known as the curse of dimensionality (COD). Based on high-dimensional statistics, we herein formulate a noise reduction method, RECODE (resolution of the curse of dimensionality), for high-dimensional data with random sampling noise. We show that RECODE consistently resolves COD in relevant scRNA-seq data with unique molecular identifiers. RECODE does not involve dimension reduction and recovers expression values for all genes, including lowly expressed genes, realizing precise delineation of cell fate transitions and identification of rare cells with all gene information. Compared with representative imputation methods, RECODE employs different principles and exhibits superior overall performance in cell-clustering, expression value recovery, and single-cell-level analysis. The RECODE algorithm is parameter-free, data-driven, deterministic, and high-speed, and its applicability can be predicted based on the variance normalization performance. We propose RECODE as a powerful strategy for preprocessing noisy high-dimensional data.

摘要

单细胞 RNA 测序(scRNA-seq)可以同时确定大量单个细胞中的基因表达情况,从而推动生物医学科学的发展。然而,scRNA-seq 数据具有高度的维度,并且存在大量的技术噪声,包括缺失值。在分析 scRNA-seq 数据时,这种噪声会产生一个称为维度灾难(COD)的统计问题。基于高维统计学,我们在此提出了一种针对随机采样噪声的高维数据降噪方法,RECODE(维度灾难的解决)。我们表明,RECODE 可以通过独特的分子标识符一致地解决相关 scRNA-seq 数据中的 COD。RECODE 不涉及降维,并且可以恢复所有基因的表达值,包括低表达基因,从而实现细胞命运转变的精确描绘,并利用所有基因信息识别稀有细胞。与代表性的插补方法相比,RECODE 采用了不同的原理,在细胞聚类、表达值恢复和单细胞水平分析方面具有卓越的整体性能。RECODE 算法是无参数、数据驱动、确定性和高速的,其适用性可以根据方差归一化性能进行预测。我们提出 RECODE 作为一种强大的预处理噪声高维数据的策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/91eb77dc4244/LSA-2022-01591_FigS9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/db6686694583/LSA-2022-01591_Fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/60734d9269c9/LSA-2022-01591_FigS1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/702443d3f30c/LSA-2022-01591_FigS2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/722310ef3c55/LSA-2022-01591_Fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/92d97541887d/LSA-2022-01591_FigS3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/f2be6c8fecf4/LSA-2022-01591_FigS4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/c6bcebe5e234/LSA-2022-01591_FigS5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/309cfa24892d/LSA-2022-01591_FigS6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/8588f37be5d2/LSA-2022-01591_Fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/65b5fc32491c/LSA-2022-01591_FigS7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/8c27d829227f/LSA-2022-01591_Fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/e6a0676aa7f0/LSA-2022-01591_FigS8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/91eb77dc4244/LSA-2022-01591_FigS9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/db6686694583/LSA-2022-01591_Fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/60734d9269c9/LSA-2022-01591_FigS1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/702443d3f30c/LSA-2022-01591_FigS2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/722310ef3c55/LSA-2022-01591_Fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/92d97541887d/LSA-2022-01591_FigS3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/f2be6c8fecf4/LSA-2022-01591_FigS4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/c6bcebe5e234/LSA-2022-01591_FigS5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/309cfa24892d/LSA-2022-01591_FigS6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/8588f37be5d2/LSA-2022-01591_Fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/65b5fc32491c/LSA-2022-01591_FigS7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/8c27d829227f/LSA-2022-01591_Fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/e6a0676aa7f0/LSA-2022-01591_FigS8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e236/9363502/91eb77dc4244/LSA-2022-01591_FigS9.jpg

相似文献

1
Resolution of the curse of dimensionality in single-cell RNA sequencing data analysis.单细胞 RNA 测序数据分析中维度诅咒的解决。
Life Sci Alliance. 2022 Aug 9;5(12):e202201591. doi: 10.26508/lsa.202201591.
2
Comparison of scRNA-seq data analysis method combinations.单细胞RNA测序数据分析方法组合的比较。
Brief Funct Genomics. 2022 Nov 17;21(6):433-440. doi: 10.1093/bfgp/elac027.
3
SSNMDI: a novel joint learning model of semi-supervised non-negative matrix factorization and data imputation for clustering of single-cell RNA-seq data.SSNMDI:一种用于单细胞 RNA-seq 数据聚类的半监督非负矩阵分解和数据插补的新型联合学习模型。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad149.
4
Machine learning and statistical methods for clustering single-cell RNA-sequencing data.机器学习和统计方法在单细胞 RNA 测序数据分析中的应用。
Brief Bioinform. 2020 Jul 15;21(4):1209-1223. doi: 10.1093/bib/bbz063.
5
Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study.降维与聚类模型在单细胞 RNA 测序数据中的应用:一项比较研究。
Int J Mol Sci. 2020 Mar 22;21(6):2181. doi: 10.3390/ijms21062181.
6
DAE-TPGM: A deep autoencoder network based on a two-part-gamma model for analyzing single-cell RNA-seq data.DAE-TPGM:一种基于两部分伽马模型的深度自动编码器网络,用于分析单细胞 RNA-seq 数据。
Comput Biol Med. 2022 Jul;146:105578. doi: 10.1016/j.compbiomed.2022.105578. Epub 2022 May 6.
7
scNPF: an integrative framework assisted by network propagation and network fusion for preprocessing of single-cell RNA-seq data.scNPF:一种基于网络传播和网络融合的综合框架,用于单细胞 RNA-seq 数据的预处理。
BMC Genomics. 2019 May 8;20(1):347. doi: 10.1186/s12864-019-5747-5.
8
scDSSC: Deep Sparse Subspace Clustering for scRNA-seq Data.scDSSC:用于 scRNA-seq 数据的深度稀疏子空间聚类。
PLoS Comput Biol. 2022 Dec 19;18(12):e1010772. doi: 10.1371/journal.pcbi.1010772. eCollection 2022 Dec.
9
scBKAP: A Clustering Model for Single-Cell RNA-Seq Data Based on Bisecting K-Means.scBKAP:基于二分 K-Means 的单细胞 RNA-Seq 数据聚类模型。
IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):2007-2015. doi: 10.1109/TCBB.2022.3230098. Epub 2023 Jun 5.
10
Matrix factorization for biomedical link prediction and scRNA-seq data imputation: an empirical survey.矩阵分解在生物医学链接预测和 scRNA-seq 数据插补中的应用:一项实证调查。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab479.

引用本文的文献

1
The mitotic STAG3-cohesin complex shapes male germline nucleome.有丝分裂期的STAG3-黏连蛋白复合体塑造雄性生殖系核组。
Nat Struct Mol Biol. 2025 Aug 25. doi: 10.1038/s41594-025-01647-w.
2
Artificial Intelligence and Neuroscience: Transformative Synergies in Brain Research and Clinical Applications.人工智能与神经科学:脑研究及临床应用中的变革性协同作用
J Clin Med. 2025 Jan 16;14(2):550. doi: 10.3390/jcm14020550.
3
From multi-omics to predictive biomarker: AI in tumor microenvironment.从多组学到预测性生物标志物:肿瘤微环境中的人工智能

本文引用的文献

1
Zero-preserving imputation of single-cell RNA-seq data.单细胞 RNA-seq 数据的零保留插补。
Nat Commun. 2022 Jan 11;13(1):192. doi: 10.1038/s41467-021-27729-z.
2
Integrated analysis of multimodal single-cell data.多模态单细胞数据的综合分析。
Cell. 2021 Jun 24;184(13):3573-3587.e29. doi: 10.1016/j.cell.2021.04.048. Epub 2021 May 31.
3
GATA transcription factors, SOX17 and TFAP2C, drive the human germ-cell specification program.GATA 转录因子、SOX17 和 TFAP2C 驱动人类生殖细胞特化程序。
Front Immunol. 2024 Dec 23;15:1514977. doi: 10.3389/fimmu.2024.1514977. eCollection 2024.
4
scEGOT: single-cell trajectory inference framework based on entropic Gaussian mixture optimal transport.scEGOT:基于熵高斯混合最优传输的单细胞轨迹推断框架。
BMC Bioinformatics. 2024 Dec 23;25(1):388. doi: 10.1186/s12859-024-05988-z.
5
Fine construction of gene coexpression network analysis using GTOM and RECODE detected a critical module of neuroblastoma stages 4 and 4S.使用 GTOM 和 RECODE 精细构建基因共表达网络分析,检测到神经母细胞瘤 4 期和 4S 期的关键模块。
Hereditas. 2024 Nov 14;161(1):44. doi: 10.1186/s41065-024-00342-y.
6
Integrated multi-omics with machine learning to uncover the intricacies of kidney disease.运用整合多组学和机器学习技术揭示肾脏疾病的复杂性。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae364.
7
Identifying Key Regulatory Genes in Drug Resistance Acquisition: Modeling Pseudotime Trajectories of Breast Cancer Single-Cell Transcriptome.识别耐药性获得过程中的关键调控基因:构建乳腺癌单细胞转录组的伪时间轨迹模型
Cancers (Basel). 2024 May 15;16(10):1884. doi: 10.3390/cancers16101884.
8
In vitro reconstitution of epigenetic reprogramming in the human germ line.在人类生殖系中体外重建表观遗传重编程。
Nature. 2024 Jul;631(8019):170-178. doi: 10.1038/s41586-024-07526-6. Epub 2024 May 20.
9
Thymic Carcinoma: Unraveling Neuroendocrine Differentiation and Epithelial Cell Identity Loss.胸腺癌:解析神经内分泌分化与上皮细胞身份丧失
Cancers (Basel). 2023 Dec 25;16(1):115. doi: 10.3390/cancers16010115.
10
Single-cell RNA sequencing technology in human spermatogenesis: Progresses and perspectives.单细胞 RNA 测序技术在人类精子发生中的应用:进展与展望。
Mol Cell Biochem. 2024 Aug;479(8):2017-2033. doi: 10.1007/s11010-023-04840-x. Epub 2023 Sep 2.
Life Sci Alliance. 2021 Feb 19;4(5). doi: 10.26508/lsa.202000974. Print 2021 May.
4
A human cell atlas of fetal gene expression.人类胎儿基因表达细胞图谱。
Science. 2020 Nov 13;370(6518). doi: 10.1126/science.aba7721.
5
A systematic evaluation of single-cell RNA-sequencing imputation methods.单细胞 RNA-seq 数据插补方法的系统评价
Genome Biol. 2020 Aug 27;21(1):218. doi: 10.1186/s13059-020-02132-x.
6
HRT Atlas v1.0 database: redefining human and mouse housekeeping genes and candidate reference transcripts by mining massive RNA-seq datasets.HRT Atlas v1.0 数据库:通过挖掘大量 RNA-seq 数据集重新定义人类和小鼠管家基因和候选参考转录本。
Nucleic Acids Res. 2021 Jan 8;49(D1):D947-D955. doi: 10.1093/nar/gkaa609.
7
Single-cell RNA counting at allele and isoform resolution using Smart-seq3.基于 Smart-seq3 技术进行等位基因和异构体分辨率的单细胞 RNA 计数
Nat Biotechnol. 2020 Jun;38(6):708-714. doi: 10.1038/s41587-020-0497-0. Epub 2020 May 4.
8
Benchmarking single-cell RNA-sequencing protocols for cell atlas projects.单细胞 RNA 测序技术在细胞图谱项目中的基准测试。
Nat Biotechnol. 2020 Jun;38(6):747-755. doi: 10.1038/s41587-020-0469-4. Epub 2020 Apr 6.
9
Mouse gastrulation: Coordination of tissue patterning, specification and diversification of cell fate.小鼠原肠胚形成:组织模式的协调、细胞命运的特化和多样化。
Mech Dev. 2020 Sep;163:103617. doi: 10.1016/j.mod.2020.103617. Epub 2020 May 27.
10
Eleven grand challenges in single-cell data science.单细胞数据科学的 11 大挑战。
Genome Biol. 2020 Feb 7;21(1):31. doi: 10.1186/s13059-020-1926-6.