School of Mathematics and Statistics, Shandong University (Weihai), Weihai, 264209, China.
Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.
BMC Bioinformatics. 2020 Oct 7;21(1):440. doi: 10.1186/s12859-020-03797-8.
Advances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types, and many clustering algorithms have been developed based on single-cell gene expression. However, we found that different data preprocessing methods show quite different effects on clustering algorithms. Moreover, there is no specific preprocessing method that is applicable to all clustering algorithms, and even for the same clustering algorithm, the best preprocessing method depends on the input data.
We designed a graph-based algorithm, SC3-e, specifically for discriminating the best data preprocessing method for SC3, which is currently the most widely used clustering algorithm for single cell clustering. When tested on eight frequently used single-cell RNA-seq data sets, SC3-e always accurately selects the best data preprocessing method for SC3 and therefore greatly enhances the clustering performance of SC3.
The SC3-e algorithm is practically powerful for discriminating the best data preprocessing method, and therefore largely enhances the performance of cell-type clustering of SC3. It is expected to play a crucial role in the related studies of single-cell clustering, such as the studies of human complex diseases and discoveries of new cell types.
单细胞 RNA-seq 技术的进步为定量描述细胞类型提供了巨大的机会,并且已经开发了许多基于单细胞基因表达的聚类算法。然而,我们发现不同的数据预处理方法对聚类算法的影响差异很大。此外,没有特定的预处理方法适用于所有聚类算法,甚至对于相同的聚类算法,最佳的预处理方法也取决于输入数据。
我们设计了一种基于图的算法 SC3-e,专门用于区分 SC3 中最佳的数据预处理方法,SC3 是目前单细胞聚类中最广泛使用的聚类算法。在对八个常用的单细胞 RNA-seq 数据集进行测试时,SC3-e 总是准确地选择了 SC3 的最佳数据预处理方法,从而大大提高了 SC3 的聚类性能。
SC3-e 算法在区分最佳数据预处理方法方面具有实际的强大功能,因此大大提高了 SC3 的细胞类型聚类性能。它有望在单细胞聚类的相关研究中发挥关键作用,例如人类复杂疾病的研究和新细胞类型的发现。