Domanskyi Sergii, Hakansson Alex, Bertus Thomas J, Paternostro Giovanni, Piermarocchi Carlo
Department of Physics and Astronomy, Michigan State University, East Lansing, MI, USA.
Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, USA.
PeerJ. 2021 Jan 13;9:e10670. doi: 10.7717/peerj.10670. eCollection 2021.
Analysis of singe cell RNA sequencing (scRNA-seq) typically consists of different steps including quality control, batch correction, clustering, cell identification and characterization, and visualization. The amount of scRNA-seq data is growing extremely fast, and novel algorithmic approaches improving these steps are key to extract more biological information. Here, we introduce: (i) two methods for automatic cell type identification (i.e., without expert curator) based on a voting algorithm and a Hopfield classifier, (ii) a method for cell anomaly quantification based on isolation forest, and (iii) a tool for the visualization of cell phenotypic landscapes based on Hopfield energy-like functions. These new approaches are integrated in a software platform that includes many other state-of-the-art methodologies and provides a self-contained toolkit for scRNA-seq analysis.
We present a suite of software elements for the analysis of scRNA-seq data. This Python-based open source software, Digital Cell Sorter (DCS), consists in an extensive toolkit of methods for scRNA-seq analysis. We illustrate the capability of the software using data from large datasets of peripheral blood mononuclear cells (PBMC), as well as plasma cells of bone marrow samples from healthy donors and multiple myeloma patients. We test the novel algorithms by evaluating their ability to deconvolve cell mixtures and detect small numbers of anomalous cells in PBMC data.
The DCS toolkit is available for download and installation through the Python Package Index (PyPI). The software can be deployed using the Python import function following installation. Source code is also available for download on Zenodo: DOI 10.5281/zenodo.2533377.
Supplemental Materials are available at PeerJ online.
单细胞RNA测序(scRNA-seq)分析通常包括不同步骤,包括质量控制、批次校正、聚类、细胞识别与表征以及可视化。scRNA-seq数据量增长极快,改进这些步骤的新型算法方法是提取更多生物学信息的关键。在此,我们介绍:(i)基于投票算法和霍普菲尔德分类器的两种自动细胞类型识别方法(即无需专家策展人),(ii)基于孤立森林的细胞异常量化方法,以及(iii)基于类霍普菲尔德能量函数的细胞表型景观可视化工具。这些新方法集成在一个软件平台中,该平台包含许多其他先进方法,并为scRNA-seq分析提供了一个独立的工具包。
我们展示了一套用于scRNA-seq数据分析的软件元素。这个基于Python的开源软件Digital Cell Sorter(DCS)由用于scRNA-seq分析的广泛方法工具包组成。我们使用来自外周血单核细胞(PBMC)大型数据集以及健康供体和多发性骨髓瘤患者骨髓样本浆细胞的数据来说明该软件的功能。我们通过评估它们在PBMC数据中解卷积细胞混合物和检测少量异常细胞的能力来测试这些新算法。
DCS工具包可通过Python包索引(PyPI)下载和安装。安装后可使用Python导入函数部署该软件。源代码也可在Zenodo上下载:DOI 10.5281/zenodo.2533377。
补充材料可在PeerJ在线获取。