Kwok Aaron Wing Cheung, Shim Heejung, McCarthy Davis J
Bioinformatics and Cellular Genomics, St Vincent's Institute of Medical Research, Fitzroy, VIC 3065, Australia.
Melbourne Integrative Genomics, University of Melbourne, Parkville, VIC, 3010, Australia.
bioRxiv. 2024 Dec 9:2024.12.04.626927. doi: 10.1101/2024.12.04.626927.
Single-cell Assay for Transposase Accessible Chromatin with sequencing (scATAC-seq) has become a widely used method for investigating chromatin accessibility at single-cell resolution. However, the resulting data is highly sparse with most data entries being zeros. As such, currently available computational methods for scATAC-seq feature a range of transformation procedures to extract meaningful information from the sparse data. Most notably, these transformations can be categorized into: 1) feature aggregation with known biological associations, 2) pseudo-bulking cells of similar biology, and 3) binarisation of count data. These strategies beg the question of whether or not scATAC-seq data actually has usable single-cell and single-region information as intended from the assay. If we can go beyond aggregated features and pooled cells, it opens up the possibility of more complex statistical tasks that require that degree of granularity. To reach the finest possible resolution of single-cell, single-region information there are inevitably many computational challenges to overcome. Here, we review the major data analysis challenges lying between raw data readout and biological discovery, and discuss the limitations of current data analysis approaches. Lastly, we conclude that chromatin accessibility profiling at true single-cell resolution is not yet achieved with current technology, but that it may be achieved with promising developments in optimising the efficiency of scATAC-seq assays.
单细胞转座酶可及染色质测序分析(scATAC-seq)已成为一种广泛应用于以单细胞分辨率研究染色质可及性的方法。然而,所得数据高度稀疏,大多数数据项为零。因此,目前用于scATAC-seq的计算方法具有一系列转换程序,以从稀疏数据中提取有意义的信息。最值得注意的是,这些转换可分为:1)与已知生物学关联的特征聚合,2)对具有相似生物学特性的细胞进行伪批量处理,以及3)计数数据的二值化。这些策略引发了一个问题,即scATAC-seq数据是否真的具有该分析预期的可用单细胞和单区域信息。如果我们能够超越聚合特征和合并细胞,就为需要这种粒度的更复杂统计任务开辟了可能性。为了获得尽可能精细的单细胞、单区域信息分辨率,不可避免地要克服许多计算挑战。在这里,我们回顾了从原始数据读出到生物学发现之间的主要数据分析挑战,并讨论了当前数据分析方法的局限性。最后,我们得出结论,目前的技术尚未实现真正的单细胞分辨率下的染色质可及性分析,但通过优化scATAC-seq分析效率的有前景的发展,这一目标可能会实现。