Hu Jian, Li Xiangjie, Hu Gang, Lyu Yafei, Susztak Katalin, Li Mingyao
Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA.
State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, China.
Nat Mach Intell. 2020 Oct;2(10):607-618. doi: 10.1038/s42256-020-00233-7. Epub 2020 Oct 5.
Clustering and cell type classification are important steps in single-cell RNA-seq (scRNA-seq) analysis. As more and more scRNA-seq data are becoming available, supervised cell type classification methods that utilize external well-annotated source data start to gain popularity over unsupervised clustering algorithms. However, the performance of existing supervised methods is highly dependent on source data quality, and they often have limited accuracy to classify cell types that are missing in the source data. To overcome these limitations, we developed ItClust, a transfer learning algorithm that borrows idea from supervised cell type classification algorithms, but also leverages information in target data to ensure sensitivity in classifying cells that are only present in the target data. Through extensive evaluations using data from different species and tissues generated with diverse scRNA-seq protocols, we show that ItClust significantly improves clustering and cell type classification accuracy over popular unsupervised clustering and supervised cell type classification algorithms.
聚类和细胞类型分类是单细胞RNA测序(scRNA-seq)分析中的重要步骤。随着越来越多的scRNA-seq数据可用,利用外部注释良好的源数据的监督细胞类型分类方法开始比无监督聚类算法更受欢迎。然而,现有监督方法的性能高度依赖于源数据质量,并且它们在对源数据中缺失的细胞类型进行分类时准确性往往有限。为了克服这些限制,我们开发了ItClust,这是一种迁移学习算法,它借鉴了监督细胞类型分类算法的思想,但也利用目标数据中的信息来确保对仅存在于目标数据中的细胞进行分类时的敏感性。通过使用来自不同物种和组织、采用不同scRNA-seq方案生成的数据进行广泛评估,我们表明ItClust比流行的无监督聚类和监督细胞类型分类算法显著提高了聚类和细胞类型分类的准确性。