基于随机森林的单细胞 RNA 测序数据相似性学习。

Random forest based similarity learning for single cell RNA sequencing data.

机构信息

Department of Developmental Biology, University of Pittsburgh, Pittsburgh, PA, USA.

Department for Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA.

出版信息

Bioinformatics. 2018 Jul 1;34(13):i79-i88. doi: 10.1093/bioinformatics/bty260.

DOI:10.1093/bioinformatics/bty260

PMID:29950006

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6022547/

Abstract

MOTIVATION

Genome-wide transcriptome sequencing applied to single cells (scRNA-seq) is rapidly becoming an assay of choice across many fields of biological and biomedical research. Scientific objectives often revolve around discovery or characterization of types or sub-types of cells, and therefore, obtaining accurate cell-cell similarities from scRNA-seq data is a critical step in many studies. While rapid advances are being made in the development of tools for scRNA-seq data analysis, few approaches exist that explicitly address this task. Furthermore, abundance and type of noise present in scRNA-seq datasets suggest that application of generic methods, or of methods developed for bulk RNA-seq data, is likely suboptimal.

RESULTS

Here, we present RAFSIL, a random forest based approach to learn cell-cell similarities from scRNA-seq data. RAFSIL implements a two-step procedure, where feature construction geared towards scRNA-seq data is followed by similarity learning. It is designed to be adaptable and expandable, and RAFSIL similarities can be used for typical exploratory data analysis tasks like dimension reduction, visualization and clustering. We show that our approach compares favorably with current methods across a diverse collection of datasets, and that it can be used to detect and highlight unwanted technical variation in scRNA-seq datasets in situations where other methods fail. Overall, RAFSIL implements a flexible approach yielding a useful tool that improves the analysis of scRNA-seq data.

AVAILABILITY AND IMPLEMENTATION

The RAFSIL R package is available at www.kostkalab.net/software.html.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

全基因组转录组测序应用于单细胞（scRNA-seq）正在迅速成为许多生物学和生物医学研究领域的首选检测方法。科学目标通常围绕着细胞类型或亚型的发现或特征描述，因此，从 scRNA-seq 数据中获得准确的细胞间相似度是许多研究的关键步骤。虽然 scRNA-seq 数据分析工具的发展取得了快速进展，但很少有方法专门解决此任务。此外，scRNA-seq 数据集存在的噪声的丰富程度和类型表明，应用通用方法或为批量 RNA-seq 数据开发的方法可能不太理想。

结果

在这里，我们提出了 RAFSIL，这是一种基于随机森林的方法，用于从 scRNA-seq 数据中学习细胞间的相似度。RAFSIL 实施了两步程序，其中针对 scRNA-seq 数据的特征构建紧随其后是相似性学习。它旨在具有适应性和可扩展性，并且 RAFSIL 相似度可用于典型的探索性数据分析任务，如降维、可视化和聚类。我们表明，我们的方法在各种数据集上与当前方法相比具有优势，并且它可用于在其他方法失败的情况下检测和突出 scRNA-seq 数据集中不需要的技术变化。总体而言，RAFSIL 实现了一种灵活的方法，生成了一个有用的工具，可改善 scRNA-seq 数据的分析。

可用性和实现

RAFSIL R 包可在 www.kostkalab.net/software.html 获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8730/6022547/0a74313a6799/bty260f1.jpg

相似文献

Random forest based similarity learning for single cell RNA sequencing data.基于随机森林的单细胞 RNA 测序数据相似性学习。

Bioinformatics. 2018 Jul 1;34(13):i79-i88. doi: 10.1093/bioinformatics/bty260.

DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.DIMM-SC：一种基于 Dirichlet 混合模型的用于聚类基于液滴的单细胞转录组学数据的方法。

Bioinformatics. 2018 Jan 1;34(1):139-146. doi: 10.1093/bioinformatics/btx490.

Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge.利用先验知识对稀疏 scRNA-seq 数据进行可扩展的预处理。

Bioinformatics. 2018 Jul 1;34(13):i124-i132. doi: 10.1093/bioinformatics/bty293.

scNPF: an integrative framework assisted by network propagation and network fusion for preprocessing of single-cell RNA-seq data.scNPF：一种基于网络传播和网络融合的综合框架，用于单细胞 RNA-seq 数据的预处理。

BMC Genomics. 2019 May 8;20(1):347. doi: 10.1186/s12864-019-5747-5.

Data Analysis in Single-Cell Transcriptome Sequencing.单细胞转录组测序中的数据分析

Methods Mol Biol. 2018;1754:311-326. doi: 10.1007/978-1-4939-7717-8_18.

scBGEDA: deep single-cell clustering analysis via a dual denoising autoencoder with bipartite graph ensemble clustering.scBGEDA：基于双分图集成分聚类的对偶去噪自动编码器的单细胞聚类分析。

Bioinformatics. 2023 Feb 14;39(2). doi: 10.1093/bioinformatics/btad075.

A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa.一种用于隐性营养不良型大疱性表皮松解症的单细胞 RNA-seq 分析的多任务聚类方法。

PLoS Comput Biol. 2018 Apr 9;14(4):e1006053. doi: 10.1371/journal.pcbi.1006053. eCollection 2018 Apr.

CONICS integrates scRNA-seq with DNA sequencing to map gene expression to tumor sub-clones.CONICS 将单细胞 RNA-seq 与 DNA 测序相结合，将基因表达映射到肿瘤亚克隆。

Bioinformatics. 2018 Sep 15;34(18):3217-3219. doi: 10.1093/bioinformatics/bty316.

Joint learning dimension reduction and clustering of single-cell RNA-sequencing data.单细胞 RNA 测序数据的联合降维和聚类学习。

Bioinformatics. 2020 Jun 1;36(12):3825-3832. doi: 10.1093/bioinformatics/btaa231.

Quality control of single-cell RNA-seq by SinQC.通过SinQC进行单细胞RNA测序的质量控制。

Bioinformatics. 2016 Aug 15;32(16):2514-6. doi: 10.1093/bioinformatics/btw176. Epub 2016 Apr 10.

引用本文的文献

CHAI: consensus clustering through similarity matrix integration for cell-type identification.CHAI：通过相似性矩阵集成进行共识聚类，以进行细胞类型识别。

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae411.

Transfer learning for clustering single-cell RNA-seq data crossing-species and batch, case on uterine fibroids.跨物种和批次的单细胞 RNA-seq 数据聚类的迁移学习：以子宫肌瘤为例。

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad426.

Decision Tree Ensembles Utilizing Multivariate Splits Are Effective at Investigating Beta Diversity in Medically Relevant 16S Amplicon Sequencing Data.利用多变量分割的决策树集成方法在研究医学相关16S扩增子测序数据中的β多样性方面是有效的。

Microbiol Spectr. 2023 Mar 6;11(2):e0206522. doi: 10.1128/spectrum.02065-22.

Bioinformatics. 2023 Feb 14;39(2). doi: 10.1093/bioinformatics/btad075.

GeneCloudOmics: A Data Analytic Cloud Platform for High-Throughput Gene Expression Analysis.基因云组学：用于高通量基因表达分析的数据分析云平台。

Front Bioinform. 2021 Nov 25;1:693836. doi: 10.3389/fbinf.2021.693836. eCollection 2021.

A Novel Algorithm for Feature Selection Using Penalized Regression with Applications to Single-Cell RNA Sequencing Data.一种使用惩罚回归进行特征选择的新算法及其在单细胞RNA测序数据中的应用

Biology (Basel). 2022 Oct 12;11(10):1495. doi: 10.3390/biology11101495.

Shared Differential Expression-Based Distance Reflects Global Cell Type Relationships in Single-Cell RNA Sequencing Data.基于共享差异表达的距离反映了单细胞 RNA 测序数据中的全局细胞类型关系。

J Comput Biol. 2022 Aug;29(8):867-879. doi: 10.1089/cmb.2021.0652. Epub 2022 Jul 6.

Computational profiling of hiPSC-derived heart organoids reveals chamber defects associated with NKX2-5 deficiency.基于计算的 hiPSC 来源的心脏类器官分析揭示了与 NKX2-5 缺陷相关的心室缺陷。

Commun Biol. 2022 Apr 29;5(1):399. doi: 10.1038/s42003-022-03346-4.

RFCell: A Gene Selection Approach for scRNA-seq Clustering Based on Permutation and Random Forest.RFCell：一种基于排列和随机森林的单细胞RNA测序聚类基因选择方法。

Front Genet. 2021 Jul 27;12:665843. doi: 10.3389/fgene.2021.665843. eCollection 2021.

SSRE: Cell Type Detection Based on Sparse Subspace Representation and Similarity Enhancement.SSRE：基于稀疏子空间表示和相似度增强的细胞类型检测。

Genomics Proteomics Bioinformatics. 2021 Apr;19(2):282-291. doi: 10.1016/j.gpb.2020.09.004. Epub 2021 Feb 27.

本文引用的文献

Detecting macroecological patterns in bacterial communities across independent studies of global soils.检测全球土壤独立研究中细菌群落的宏观生态模式。

Nat Microbiol. 2018 Feb;3(2):189-196. doi: 10.1038/s41564-017-0062-x. Epub 2017 Nov 20.

Using neural networks for reducing the dimensions of single-cell RNA-Seq data.使用神经网络降低单细胞RNA测序数据的维度。

Nucleic Acids Res. 2017 Sep 29;45(17):e156. doi: 10.1093/nar/gkx681.

Psychrophilic proteases dramatically reduce single-cell RNA-seq artifacts: a molecular atlas of kidney development.嗜冷蛋白酶显著减少单细胞RNA测序假象：肾脏发育的分子图谱

Development. 2017 Oct 1;144(19):3625-3632. doi: 10.1242/dev.151142. Epub 2017 Aug 29.

Challenges and emerging directions in single-cell analysis.单细胞分析中的挑战与新方向

Genome Biol. 2017 May 8;18(1):84. doi: 10.1186/s13059-017-1218-y.

SC3: consensus clustering of single-cell RNA-seq data.SC3：单细胞RNA测序数据的一致性聚类

Nat Methods. 2017 May;14(5):483-486. doi: 10.1038/nmeth.4236. Epub 2017 Mar 27.

Power analysis of single-cell RNA-sequencing experiments.单细胞 RNA 测序实验的功效分析。

Nat Methods. 2017 Apr;14(4):381-387. doi: 10.1038/nmeth.4220. Epub 2017 Mar 6.

Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning.基于核函数相似性学习的单细胞 RNA-seq 数据可视化与分析。

Nat Methods. 2017 Apr;14(4):414-416. doi: 10.1038/nmeth.4207. Epub 2017 Mar 6.

Clustering Single-Cell Expression Data Using Random Forest Graphs.使用随机森林图对单细胞表达数据进行聚类

IEEE J Biomed Health Inform. 2017 Jul;21(4):1172-1181. doi: 10.1109/JBHI.2016.2565561. Epub 2016 May 10.

Understanding development and stem cells using single cell-based analyses of gene expression.利用基于单细胞的基因表达分析来理解发育和干细胞。

Development. 2017 Jan 1;144(1):17-32. doi: 10.1242/dev.133058.

Innate-like functions of natural killer T cell subsets result from highly divergent gene programs.自然杀伤T细胞亚群的固有样功能源于高度不同的基因程序。

Nat Immunol. 2016 Jun;17(6):728-39. doi: 10.1038/ni.3437. Epub 2016 Apr 18.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于随机森林的单细胞 RNA 测序数据相似性学习。

Random forest based similarity learning for single cell RNA sequencing data.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献