University of Connecticut, Storrs, 06269, CT, USA.
BMC Genomics. 2018 Aug 13;19(Suppl 6):569. doi: 10.1186/s12864-018-4922-4.
Single cell transcriptomics is critical for understanding cellular heterogeneity and identification of novel cell types. Leveraging the recent advances in single cell RNA sequencing (scRNA-Seq) technology requires novel unsupervised clustering algorithms that are robust to high levels of technical and biological noise and scale to datasets of millions of cells.
We present novel computational approaches for clustering scRNA-seq data based on the Term Frequency - Inverse Document Frequency (TF-IDF) transformation that has been successfully used in the field of text analysis.
Empirical experimental results show that TF-IDF methods consistently outperform commonly used scRNA-Seq clustering approaches.
单细胞转录组学对于理解细胞异质性和新型细胞类型的鉴定至关重要。利用单细胞 RNA 测序(scRNA-Seq)技术的最新进展需要新的无监督聚类算法,这些算法需要具有较强的抗高水平技术和生物噪声的能力,并能够扩展到数百万个细胞的数据集。
我们提出了基于词频-逆文档频率(TF-IDF)转换的 scRNA-seq 数据聚类的新计算方法,该方法已成功应用于文本分析领域。
实验结果表明,TF-IDF 方法始终优于常用的 scRNA-Seq 聚类方法。