Ohio State University.
Translational Data Analytics Institute at the Ohio State University.
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa289.
The prevalence of dropout events is a serious problem for single-cell Hi-C (scHiC) data due to insufficient sequencing depth and data coverage, which brings difficulties in downstream studies such as clustering and structural analysis. Complicating things further is the fact that dropouts are confounded with structural zeros due to underlying properties, leading to observed zeros being a mixture of both types of events. Although a great deal of progress has been made in imputing dropout events for single cell RNA-sequencing (RNA-seq) data, little has been done in identifying structural zeros and imputing dropouts for scHiC data. In this paper, we adapted several methods from the single-cell RNA-seq literature for inference on observed zeros in scHiC data and evaluated their effectiveness. Through an extensive simulation study and real data analysis, we have shown that a couple of the adapted single-cell RNA-seq algorithms can be powerful for correctly identifying structural zeros and accurately imputing dropout values. Downstream analysis using the imputed values showed considerable improvement for clustering cells of the same types together over clustering results before imputation.
由于测序深度和数据覆盖度不足,单细胞 Hi-C(scHiC)数据中的缺失事件发生率是一个严重的问题,这给聚类和结构分析等下游研究带来了困难。更复杂的是,由于底层特性,缺失事件与结构零值混淆,导致观察到的零值是两种事件的混合。尽管在对单细胞 RNA 测序(RNA-seq)数据进行缺失事件推断方面已经取得了很大进展,但在识别结构零值和推断 scHiC 数据缺失值方面却做得很少。在本文中,我们从单细胞 RNA-seq 文献中采用了几种方法来推断 scHiC 数据中的观测零值,并评估了它们的有效性。通过广泛的模拟研究和真实数据分析,我们表明,几种适应的单细胞 RNA-seq 算法可以有效地识别结构零值并准确推断缺失值。使用推断值进行下游分析表明,与推断前的聚类结果相比,将相同类型的细胞聚类在一起的聚类结果有了相当大的改善。