Nasrollahi Fatemeh Sadat Fatemi, Silva Filipi Nascimento, Liu Shiwei, Chaudhuri Soumilee, Yu Meichen, Wang Juexin, Nho Kwangsik, Saykin Andrew J, Bennett David A, Sporns Olaf, Fortunato Santo
Observatory of Social Media, Luddy School of Informatics, Computing, and Engineering, Indiana University, Indiana, USA.
Center for Neuroimaging and the Indiana Alzheimer's Disease Research Center, Indiana University, Indiana, USA.
bioRxiv. 2024 Dec 7:2024.12.04.626793. doi: 10.1101/2024.12.04.626793.
Single cell RNA-seq (scRNA-seq) technologies provide unprecedented resolution representing transcriptomics at the level of single cell. One of the biggest challenges in scRNA-seq data analysis is the cell type annotation, which is usually inferred by cell separation approaches. In-silico algorithms that accurately identify individual cell types in ongoing single-cell sequencing studies are crucial for unlocking cellular heterogeneity and understanding the biological basis of diseases. In this study, we focus on robustly identifying cell types in single-cell RNA sequencing data; we conduct a comparative analysis using methods established in biology, like Seurat, Leiden, and WGCNA, as well as Infomap, statistical inference via Stochastic Block Models (SBM), and single-cell Graph Neural Networks (scGNN). We also analyze preprocessing pipelines to identify and optimize key components in the process. Leveraging two independent datasets, PBMC and ROSMAP, we employ clustering algorithms on cell-cell networks derived from gene expression data. Our findings reveal that while clusters detected by WGCNA exhibit limited correspondence with cell types, those identified by multiresolution Infomap and Leiden, and SBM show a closer alignment, with Infomap standing out as a particularly effective approach. Infomap notably offers valuable insights for the precise characterization of cellular landscapes related to neurodegenration and immunology in scRNA-seq.
单细胞RNA测序(scRNA-seq)技术提供了前所未有的分辨率,能够在单细胞水平上呈现转录组学。scRNA-seq数据分析中最大的挑战之一是细胞类型注释,通常通过细胞分离方法来推断。在正在进行的单细胞测序研究中,能够准确识别个体细胞类型的计算机算法对于揭示细胞异质性和理解疾病的生物学基础至关重要。在本研究中,我们专注于在单细胞RNA测序数据中稳健地识别细胞类型;我们使用生物学中已建立的方法进行比较分析,如Seurat、Leiden和WGCNA,以及Infomap、通过随机块模型(SBM)进行的统计推断和单细胞图神经网络(scGNN)。我们还分析预处理流程,以识别和优化该过程中的关键组件。利用两个独立的数据集PBMC和ROSMAP,我们对从基因表达数据衍生的细胞-细胞网络应用聚类算法。我们的研究结果表明,虽然WGCNA检测到的聚类与细胞类型的对应关系有限,但多分辨率Infomap和Leiden以及SBM识别出的聚类显示出更紧密的一致性,其中Infomap是一种特别有效的方法。Infomap尤其为scRNA-seq中与神经退行性变和免疫学相关的细胞景观的精确表征提供了有价值的见解。