Center for Genomics, School of Medicine, Loma Linda University, Loma Linda, CA, USA.
CCR-SF Bioinformatics Group, Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.
Nat Biotechnol. 2021 Sep;39(9):1103-1114. doi: 10.1038/s41587-020-00748-9. Epub 2020 Dec 21.
Comparing diverse single-cell RNA sequencing (scRNA-seq) datasets generated by different technologies and in different laboratories remains a major challenge. Here we address the need for guidance in choosing algorithms leading to accurate biological interpretations of varied data types acquired with different platforms. Using two well-characterized cellular reference samples (breast cancer cells and B cells), captured either separately or in mixtures, we compared different scRNA-seq platforms and several preprocessing, normalization and batch-effect correction methods at multiple centers. Although preprocessing and normalization contributed to variability in gene detection and cell classification, batch-effect correction was by far the most important factor in correctly classifying the cells. Moreover, scRNA-seq dataset characteristics (for example, sample and cellular heterogeneity and platform used) were critical in determining the optimal bioinformatic method. However, reproducibility across centers and platforms was high when appropriate bioinformatic methods were applied. Our findings offer practical guidance for optimizing platform and software selection when designing an scRNA-seq study.
比较不同技术和不同实验室生成的多样化单细胞 RNA 测序(scRNA-seq)数据集仍然是一个主要挑战。在这里,我们需要指导如何选择算法,以实现对不同平台获得的不同数据类型的准确生物学解释。我们使用两个经过良好表征的细胞参考样本(乳腺癌细胞和 B 细胞),分别或混合捕获,在多个中心比较了不同的 scRNA-seq 平台和几种预处理、归一化和批次效应校正方法。尽管预处理和归一化导致基因检测和细胞分类的变异性,但批次效应校正显然是正确分类细胞的最重要因素。此外,scRNA-seq 数据集的特征(例如,样本和细胞异质性以及使用的平台)对于确定最佳生物信息学方法至关重要。然而,当应用适当的生物信息学方法时,跨中心和平台的重现性很高。我们的研究结果为设计 scRNA-seq 研究时优化平台和软件选择提供了实用指导。