Suppr超能文献

基于随机森林的单细胞 RNA 测序数据相似性学习。

Random forest based similarity learning for single cell RNA sequencing data.

机构信息

Department of Developmental Biology, University of Pittsburgh, Pittsburgh, PA, USA.

Department for Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA.

出版信息

Bioinformatics. 2018 Jul 1;34(13):i79-i88. doi: 10.1093/bioinformatics/bty260.

Abstract

MOTIVATION

Genome-wide transcriptome sequencing applied to single cells (scRNA-seq) is rapidly becoming an assay of choice across many fields of biological and biomedical research. Scientific objectives often revolve around discovery or characterization of types or sub-types of cells, and therefore, obtaining accurate cell-cell similarities from scRNA-seq data is a critical step in many studies. While rapid advances are being made in the development of tools for scRNA-seq data analysis, few approaches exist that explicitly address this task. Furthermore, abundance and type of noise present in scRNA-seq datasets suggest that application of generic methods, or of methods developed for bulk RNA-seq data, is likely suboptimal.

RESULTS

Here, we present RAFSIL, a random forest based approach to learn cell-cell similarities from scRNA-seq data. RAFSIL implements a two-step procedure, where feature construction geared towards scRNA-seq data is followed by similarity learning. It is designed to be adaptable and expandable, and RAFSIL similarities can be used for typical exploratory data analysis tasks like dimension reduction, visualization and clustering. We show that our approach compares favorably with current methods across a diverse collection of datasets, and that it can be used to detect and highlight unwanted technical variation in scRNA-seq datasets in situations where other methods fail. Overall, RAFSIL implements a flexible approach yielding a useful tool that improves the analysis of scRNA-seq data.

AVAILABILITY AND IMPLEMENTATION

The RAFSIL R package is available at www.kostkalab.net/software.html.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

全基因组转录组测序应用于单细胞(scRNA-seq)正在迅速成为许多生物学和生物医学研究领域的首选检测方法。科学目标通常围绕着细胞类型或亚型的发现或特征描述,因此,从 scRNA-seq 数据中获得准确的细胞间相似度是许多研究的关键步骤。虽然 scRNA-seq 数据分析工具的发展取得了快速进展,但很少有方法专门解决此任务。此外,scRNA-seq 数据集存在的噪声的丰富程度和类型表明,应用通用方法或为批量 RNA-seq 数据开发的方法可能不太理想。

结果

在这里,我们提出了 RAFSIL,这是一种基于随机森林的方法,用于从 scRNA-seq 数据中学习细胞间的相似度。RAFSIL 实施了两步程序,其中针对 scRNA-seq 数据的特征构建紧随其后是相似性学习。它旨在具有适应性和可扩展性,并且 RAFSIL 相似度可用于典型的探索性数据分析任务,如降维、可视化和聚类。我们表明,我们的方法在各种数据集上与当前方法相比具有优势,并且它可用于在其他方法失败的情况下检测和突出 scRNA-seq 数据集中不需要的技术变化。总体而言,RAFSIL 实现了一种灵活的方法,生成了一个有用的工具,可改善 scRNA-seq 数据的分析。

可用性和实现

RAFSIL R 包可在 www.kostkalab.net/software.html 获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8730/6022547/0a74313a6799/bty260f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验