Miao Zhen, Wang Jianqiao, Park Kernyu, Kuang Da, Kim Junhyong
Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Department of Biology, University of Pennsylvania, Philadelphia, PA, USA.
Nat Commun. 2025 Jan 5;16(1):401. doi: 10.1038/s41467-024-55580-5.
Single cell ATAC-seq (scATAC-seq) experimental designs have become increasingly complex, with multiple factors that might affect chromatin accessibility, including genotype, cell type, tissue of origin, sample location, batch, etc., whose compound effects are difficult to test by existing methods. In addition, current scATAC-seq data present statistical difficulties due to their sparsity and variations in individual sequence capture. To address these problems, we present a zero-adjusted statistical model, Probability model of Accessible Chromatin of Single cells (PACS), that allows complex hypothesis testing of accessibility-modulating factors while accounting for sparse and incomplete data. For differential accessibility analysis, PACS controls the false positive rate and achieves a 17% to 122% higher power on average than existing tools. We demonstrate the effectiveness of PACS through several analysis tasks, including supervised cell type annotation, compound hypothesis testing, batch effect correction, and spatiotemporal modeling. We apply PACS to datasets from various tissues and show its ability to reveal previously undiscovered insights in scATAC-seq data.
单细胞染色质转座酶可及性测序(scATAC-seq)实验设计变得越来越复杂,存在多个可能影响染色质可及性的因素,包括基因型、细胞类型、组织来源、样本位置、批次等,其复合效应难以用现有方法进行检验。此外,当前的scATAC-seq数据由于其稀疏性和个体序列捕获的变异性而存在统计困难。为了解决这些问题,我们提出了一种零调整统计模型,即单细胞可及染色质概率模型(PACS),该模型允许在考虑稀疏和不完整数据的同时,对可及性调节因子进行复杂的假设检验。对于差异可及性分析,PACS控制假阳性率,平均比现有工具的功效高17%至122%。我们通过几个分析任务证明了PACS的有效性,包括监督细胞类型注释、复合假设检验、批次效应校正和时空建模。我们将PACS应用于来自各种组织的数据集,并展示了其揭示scATAC-seq数据中先前未发现的见解的能力。