Bernstein Matthew N, Ma Zhongjie, Gleicher Michael, Dewey Colin N
Morgridge Institute for Research, Madison, WI 53715, USA.
Department of Computer Sciences, University of Wisconsin - Madison, Madison, WI 53706, USA.
iScience. 2020 Dec 8;24(1):101913. doi: 10.1016/j.isci.2020.101913. eCollection 2021 Jan 22.
Cell type annotation is a fundamental task in the analysis of single-cell RNA-sequencing data. In this work, we present CellO, a machine learning-based tool for annotating human RNA-seq data with the Cell Ontology. CellO enables accurate and standardized cell type classification of cell clusters by considering the rich hierarchical structure of known cell types. Furthermore, CellO comes pre-trained on a comprehensive data set of human, healthy, untreated primary samples in the Sequence Read Archive. CellO's comprehensive training set enables it to run out of the box on diverse cell types and achieves competitive or even superior performance when compared to existing state-of-the-art methods. Lastly, CellO's linear models are easily interpreted, thereby enabling exploration of cell-type-specific expression signatures across the ontology. To this end, we also present the CellO Viewer: a web application for exploring CellO's models across the ontology.
细胞类型注释是单细胞RNA测序数据分析中的一项基本任务。在这项工作中,我们展示了CellO,这是一种基于机器学习的工具,用于使用细胞本体对人类RNA测序数据进行注释。通过考虑已知细胞类型的丰富层次结构,CellO能够对细胞簇进行准确且标准化的细胞类型分类。此外,CellO在序列读取存档中关于人类、健康、未经处理的原代样本的综合数据集上进行了预训练。CellO的综合训练集使其能够在各种细胞类型上开箱即用,并且与现有的最先进方法相比,实现了具有竞争力甚至更优的性能。最后,CellO的线性模型易于解释,从而能够探索整个本体中细胞类型特异性的表达特征。为此,我们还展示了CellO Viewer:一个用于跨本体探索CellO模型的网络应用程序。