• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于单细胞 RNA-seq 数据的细胞类型聚类中数据预处理的影响。

Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data.

机构信息

School of Mathematics and Statistics, Shandong University (Weihai), Weihai, 264209, China.

Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.

出版信息

BMC Bioinformatics. 2020 Oct 7;21(1):440. doi: 10.1186/s12859-020-03797-8.

DOI:10.1186/s12859-020-03797-8
PMID:33028196
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7541255/
Abstract

BACKGROUND

Advances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types, and many clustering algorithms have been developed based on single-cell gene expression. However, we found that different data preprocessing methods show quite different effects on clustering algorithms. Moreover, there is no specific preprocessing method that is applicable to all clustering algorithms, and even for the same clustering algorithm, the best preprocessing method depends on the input data.

RESULTS

We designed a graph-based algorithm, SC3-e, specifically for discriminating the best data preprocessing method for SC3, which is currently the most widely used clustering algorithm for single cell clustering. When tested on eight frequently used single-cell RNA-seq data sets, SC3-e always accurately selects the best data preprocessing method for SC3 and therefore greatly enhances the clustering performance of SC3.

CONCLUSION

The SC3-e algorithm is practically powerful for discriminating the best data preprocessing method, and therefore largely enhances the performance of cell-type clustering of SC3. It is expected to play a crucial role in the related studies of single-cell clustering, such as the studies of human complex diseases and discoveries of new cell types.

摘要

背景

单细胞 RNA-seq 技术的进步为定量描述细胞类型提供了巨大的机会,并且已经开发了许多基于单细胞基因表达的聚类算法。然而,我们发现不同的数据预处理方法对聚类算法的影响差异很大。此外,没有特定的预处理方法适用于所有聚类算法,甚至对于相同的聚类算法,最佳的预处理方法也取决于输入数据。

结果

我们设计了一种基于图的算法 SC3-e,专门用于区分 SC3 中最佳的数据预处理方法,SC3 是目前单细胞聚类中最广泛使用的聚类算法。在对八个常用的单细胞 RNA-seq 数据集进行测试时,SC3-e 总是准确地选择了 SC3 的最佳数据预处理方法,从而大大提高了 SC3 的聚类性能。

结论

SC3-e 算法在区分最佳数据预处理方法方面具有实际的强大功能,因此大大提高了 SC3 的细胞类型聚类性能。它有望在单细胞聚类的相关研究中发挥关键作用,例如人类复杂疾病的研究和新细胞类型的发现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdfc/7541255/ae99dcf6bfb5/12859_2020_3797_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdfc/7541255/5cbf33573427/12859_2020_3797_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdfc/7541255/d402836d9ce7/12859_2020_3797_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdfc/7541255/b0c97c5cd4af/12859_2020_3797_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdfc/7541255/ae99dcf6bfb5/12859_2020_3797_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdfc/7541255/5cbf33573427/12859_2020_3797_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdfc/7541255/d402836d9ce7/12859_2020_3797_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdfc/7541255/b0c97c5cd4af/12859_2020_3797_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdfc/7541255/ae99dcf6bfb5/12859_2020_3797_Fig4_HTML.jpg

相似文献

1
Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data.基于单细胞 RNA-seq 数据的细胞类型聚类中数据预处理的影响。
BMC Bioinformatics. 2020 Oct 7;21(1):440. doi: 10.1186/s12859-020-03797-8.
2
A critical assessment of clustering algorithms to improve cell clustering and identification in single-cell transcriptome study.对聚类算法的批判性评估,以提高单细胞转录组研究中的细胞聚类和鉴定。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad497.
3
SAFE-clustering: Single-cell Aggregated (from Ensemble) clustering for single-cell RNA-seq data.SAFE-clustering:单细胞 RNA-seq 数据的单细胞聚集(来自集成)聚类。
Bioinformatics. 2019 Apr 15;35(8):1269-1277. doi: 10.1093/bioinformatics/bty793.
4
SC3: consensus clustering of single-cell RNA-seq data.SC3:单细胞RNA测序数据的一致性聚类
Nat Methods. 2017 May;14(5):483-486. doi: 10.1038/nmeth.4236. Epub 2017 Mar 27.
5
NDRindex: a method for the quality assessment of single-cell RNA-Seq preprocessing data.NDRindex:一种用于评估单细胞 RNA-Seq 预处理数据质量的方法。
BMC Bioinformatics. 2020 Dec 16;21(Suppl 16):540. doi: 10.1186/s12859-020-03883-x.
6
Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis.基于自动编码器的单细胞 RNA-seq 数据分析聚类集成。
BMC Bioinformatics. 2019 Dec 24;20(Suppl 19):660. doi: 10.1186/s12859-019-3179-5.
7
scNPF: an integrative framework assisted by network propagation and network fusion for preprocessing of single-cell RNA-seq data.scNPF:一种基于网络传播和网络融合的综合框架,用于单细胞 RNA-seq 数据的预处理。
BMC Genomics. 2019 May 8;20(1):347. doi: 10.1186/s12864-019-5747-5.
8
Single-cell RNA-seq data clustering: A survey with performance comparison study.单细胞 RNA-seq 数据聚类:一项具有性能比较研究的综述。
J Bioinform Comput Biol. 2020 Aug;18(4):2040005. doi: 10.1142/S0219720020400053. Epub 2020 Aug 14.
9
SC3-seq: a method for highly parallel and quantitative measurement of single-cell gene expression.SC3-seq:一种用于单细胞基因表达的高度并行和定量测量的方法。
Nucleic Acids Res. 2015 May 19;43(9):e60. doi: 10.1093/nar/gkv134. Epub 2015 Feb 26.
10
Evaluation of single-cell RNA-seq clustering algorithms on cancer tumor datasets.癌症肿瘤数据集上单细胞RNA测序聚类算法的评估
Comput Struct Biotechnol J. 2022 Oct 26;20:6375-6387. doi: 10.1016/j.csbj.2022.10.029. eCollection 2022.

引用本文的文献

1
Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation.单细胞RNA测序数据分析中Seurat函数参数值:生物学解释的潜在陷阱与改进
Front Bioinform. 2025 Feb 12;5:1519468. doi: 10.3389/fbinf.2025.1519468. eCollection 2025.
2
On Leveraging Machine Learning in Sport Science in the Hypothetico-deductive Framework.在假设-演绎框架下体育科学中利用机器学习的研究
Sports Med Open. 2024 Nov 14;10(1):124. doi: 10.1186/s40798-024-00788-4.
3
Omada: robust clustering of transcriptomes through multiple testing.

本文引用的文献

1
Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression.使用正则化负二项式回归进行单细胞 RNA-seq 数据的归一化和方差稳定化。
Genome Biol. 2019 Dec 23;20(1):296. doi: 10.1186/s13059-019-1874-1.
2
Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters.流式数字细胞分选仪(p-DCS):从单细胞 RNA 测序簇中自动识别血细胞类型。
BMC Bioinformatics. 2019 Jul 1;20(1):369. doi: 10.1186/s12859-019-2951-x.
3
Single-cell mapping of gene expression landscapes and lineage in the zebrafish embryo.
Omada:通过多重检验实现转录组的稳健聚类。
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae039.
4
The effect of data transformation on low-dimensional integration of single-cell RNA-seq.数据转换对单细胞 RNA-seq 低维整合的影响。
BMC Bioinformatics. 2024 Apr 30;25(1):171. doi: 10.1186/s12859-024-05788-5.
5
Robust, scalable, and informative clustering for diverse biological networks.用于多种生物网络的健壮、可扩展且信息丰富的聚类。
Genome Biol. 2023 Oct 12;24(1):228. doi: 10.1186/s13059-023-03062-0.
6
Big Data in Gastroenterology Research.大数据在胃肠病学研究中的应用。
Int J Mol Sci. 2023 Jan 27;24(3):2458. doi: 10.3390/ijms24032458.
7
scIMC: a platform for benchmarking comparison and visualization analysis of scRNA-seq data imputation methods.scIMC:用于基准测试、比较和可视化分析 scRNA-seq 数据插补方法的平台。
Nucleic Acids Res. 2022 May 20;50(9):4877-4899. doi: 10.1093/nar/gkac317.
8
Data-driven assessment of dimension reduction quality for single-cell omics data.单细胞组学数据降维质量的数据驱动评估
Patterns (N Y). 2022 Mar 11;3(3):100465. doi: 10.1016/j.patter.2022.100465.
9
Modelling the bioinformatics tertiary analysis research process.建立生物信息学三级分析研究过程模型。
BMC Bioinformatics. 2021 Sep 30;22(Suppl 13):452. doi: 10.1186/s12859-021-04310-5.
单细胞映射斑马鱼胚胎中的基因表达图谱和谱系。
Science. 2018 Jun 1;360(6392):981-987. doi: 10.1126/science.aar4362. Epub 2018 Apr 26.
4
Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis.单细胞重建斑马鱼胚胎发生过程中的发育轨迹。
Science. 2018 Jun 1;360(6392). doi: 10.1126/science.aar3131. Epub 2018 Apr 26.
5
Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.通过匹配相互最近邻,纠正单细胞 RNA 测序数据中的批次效应。
Nat Biotechnol. 2018 Jun;36(5):421-427. doi: 10.1038/nbt.4091. Epub 2018 Apr 2.
6
Comprehensive single-cell transcriptional profiling of a multicellular organism.多细胞生物的全面单细胞转录谱分析。
Science. 2017 Aug 18;357(6352):661-667. doi: 10.1126/science.aam8940.
7
Landscape of Infiltrating T Cells in Liver Cancer Revealed by Single-Cell Sequencing.单细胞测序揭示肝癌浸润 T 细胞景观。
Cell. 2017 Jun 15;169(7):1342-1356.e16. doi: 10.1016/j.cell.2017.05.035.
8
Normalizing single-cell RNA sequencing data: challenges and opportunities.单细胞RNA测序数据的标准化:挑战与机遇
Nat Methods. 2017 Jun;14(6):565-571. doi: 10.1038/nmeth.4292. Epub 2017 May 15.
9
SC3: consensus clustering of single-cell RNA-seq data.SC3:单细胞RNA测序数据的一致性聚类
Nat Methods. 2017 May;14(5):483-486. doi: 10.1038/nmeth.4236. Epub 2017 Mar 27.
10
Molecular interrogation of hypothalamic organization reveals distinct dopamine neuronal subtypes.对下丘脑组织的分子研究揭示了不同的多巴胺神经元亚型。
Nat Neurosci. 2017 Feb;20(2):176-188. doi: 10.1038/nn.4462. Epub 2016 Dec 19.