Miao Zhen, Wang Jianqiao, Park Kernyu, Kuang Da, Kim Junhyong
Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Department of Biology, University of Pennsylvania, Philadelphia, PA, USA.
bioRxiv. 2024 Mar 24:2023.07.30.551108. doi: 10.1101/2023.07.30.551108.
Single nucleus ATAC-seq (snATAC-seq) experimental designs have become increasingly complex with multiple factors that might affect chromatin accessibility, including genotype, cell type, tissue of origin, sample location, batch, etc., whose compound effects are difficult to test by existing methods. In addition, current snATAC-seq data present statistical difficulties due to their sparsity and variations in individual sequence capture. To address these problems, we present a zero-adjusted statistical model, Probability model of Accessible Chromatin of Single cells (PACS), that can allow complex hypothesis testing of factors that affect accessibility while accounting for sparse and incomplete data. For differential accessibility analysis, PACS controls the false positive rate and achieves on average a 17% to 122% higher power than existing tools. We demonstrate the effectiveness of PACS through several analysis tasks including supervised cell type annotation, compound hypothesis testing, batch effect correction, and spatiotemporal modeling. We apply PACS to several datasets from a variety of tissues and show its ability to reveal previously undiscovered insights in snATAC-seq data.
单细胞核转座酶可及染色质测序(snATAC-seq)实验设计变得越来越复杂,存在多个可能影响染色质可及性的因素,包括基因型、细胞类型、组织来源、样本位置、批次等,其复合效应难以用现有方法进行检验。此外,当前的snATAC-seq数据由于其稀疏性和个体序列捕获的变异性而存在统计困难。为了解决这些问题,我们提出了一种零调整统计模型,单细胞可及染色质概率模型(PACS),它可以在考虑稀疏和不完整数据的同时,对影响可及性的因素进行复杂的假设检验。对于差异可及性分析,PACS控制假阳性率,平均比现有工具的功效高17%至122%。我们通过包括监督细胞类型注释、复合假设检验、批次效应校正和时空建模在内的几个分析任务证明了PACS的有效性。我们将PACS应用于来自各种组织的几个数据集,并展示了它在snATAC-seq数据中揭示以前未发现的见解的能力。