Zhou Bing, Liu Quanzhong, Wang Meili, Wu Hao
School of Software, Shandong University, Jinan, Shandong, 250100, China.
College of Information Engineering, Northwest A&F University, 712100, Yangling, Shaanxi, China.
BMC Genomics. 2024 Sep 16;22(Suppl 5):922. doi: 10.1186/s12864-024-10764-7.
Cell type prediction is crucial to cell type identification of genomics, cancer diagnosis and drug development, and it can solve the time-consuming and difficult problem of cell classification in biological experiments. Therefore, a computational method is urgently needed to classify and predict cell types using single-cell Hi-C data. In previous studies, there is a lack of convenient and accurate method to predict cell types based on single-cell Hi-C data. Deep neural networks can form complex representations of single-cell Hi-C data and make it possible to handle the multidimensional and sparse biological datasets.
We compare the performance of SCANN with existing methods and analyze the model by using five different evaluation metrics. When using only ML1 and ML3 datasets, the ARI and NMI values of SCANN increase by 14% and 11% over those of scHiCluster respectively. However, when using all six libraries of data, the ARI and NMI values of SCANN increase by 63% and 88% over those of scHiCluster respectively. These findings show that SCANN is highly accurate in predicting the type of independent cell samples using single-cell Hi-C data.
SCANN enhances the training speed and requires fewer resources for predicting cell types. In addition, when the number of cells in different cell types was extremely unbalanced, SCANN has higher stability and flexibility in solving cell classification and cell type prediction using the single-cell Hi-C data. This predication method can assist biologists to study the differences in the chromosome structure of cells between different cell types.
细胞类型预测对于基因组学中的细胞类型识别、癌症诊断和药物开发至关重要,它可以解决生物学实验中细胞分类耗时且困难的问题。因此,迫切需要一种计算方法来利用单细胞Hi-C数据对细胞类型进行分类和预测。在先前的研究中,缺乏基于单细胞Hi-C数据预测细胞类型的便捷且准确的方法。深度神经网络可以形成单细胞Hi-C数据的复杂表示,并使得处理多维且稀疏的生物学数据集成为可能。
我们将SCANN与现有方法的性能进行比较,并使用五种不同的评估指标对模型进行分析。仅使用ML1和ML3数据集时,SCANN的ARI和NMI值分别比scHiCluster的增加了14%和11%。然而,使用所有六个库的数据时,SCANN的ARI和NMI值分别比scHiCluster的增加了63%和88%。这些发现表明,SCANN在使用单细胞Hi-C数据预测独立细胞样本类型方面具有高度准确性。
SCANN提高了训练速度,并且在预测细胞类型时所需资源更少。此外,当不同细胞类型中的细胞数量极度不平衡时,SCANN在使用单细胞Hi-C数据解决细胞分类和细胞类型预测方面具有更高的稳定性和灵活性。这种预测方法可以帮助生物学家研究不同细胞类型之间细胞染色体结构的差异。