Baek Seungbyn, Lee Insuk
Department of Biotechnology, College of Life Science & Biotechnology, Yonsei University, Seoul 03722, Korea.
Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722, Korea.
Comput Struct Biotechnol J. 2020 Jun 12;18:1429-1439. doi: 10.1016/j.csbj.2020.06.012. eCollection 2020.
Most genetic variations associated with human complex traits are located in non-coding genomic regions. Therefore, understanding the genotype-to-phenotype axis requires a comprehensive catalog of functional non-coding genomic elements, most of which are involved in epigenetic regulation of gene expression. Genome-wide maps of open chromatin regions can facilitate functional analysis of cis- and trans-regulatory elements via their connections with trait-associated sequence variants. Currently, Assay for Transposase Accessible Chromatin with high-throughput sequencing (ATAC-seq) is considered the most accessible and cost-effective strategy for genome-wide profiling of chromatin accessibility. Single-cell ATAC-seq (scATAC-seq) technology has also been developed to study cell type-specific chromatin accessibility in tissue samples containing a heterogeneous cellular population. However, due to the intrinsic nature of scATAC-seq data, which are highly noisy and sparse, accurate extraction of biological signals and devising effective biological hypothesis are difficult. To overcome such limitations in scATAC-seq data analysis, new methods and software tools have been developed over the past few years. Nevertheless, there is no consensus for the best practice of scATAC-seq data analysis yet. In this review, we discuss scATAC-seq technology and data analysis methods, ranging from preprocessing to downstream analysis, along with an up-to-date list of published studies that involved the application of this method. We expect this review will provide a guideline for successful data generation and analysis methods using appropriate software tools and databases for the study of chromatin accessibility at single-cell resolution.
大多数与人类复杂性状相关的基因变异位于非编码基因组区域。因此,理解基因型到表型的轴需要一份功能非编码基因组元件的综合目录,其中大多数元件参与基因表达的表观遗传调控。全基因组开放染色质区域图谱可以通过与性状相关序列变异的联系,促进顺式和反式调控元件的功能分析。目前,基于高通量测序的转座酶可及染色质分析(ATAC-seq)被认为是全基因组染色质可及性分析中最易获取且最具成本效益的策略。单细胞ATAC-seq(scATAC-seq)技术也已被开发出来,用于研究包含异质细胞群体的组织样本中细胞类型特异性的染色质可及性。然而,由于scATAC-seq数据本身具有高度噪声和稀疏的特点,准确提取生物信号并设计有效的生物学假设很困难。为了克服scATAC-seq数据分析中的这些局限性,在过去几年中已经开发了新的方法和软件工具。尽管如此,对于scATAC-seq数据分析的最佳实践尚未达成共识。在这篇综述中,我们讨论了scATAC-seq技术和数据分析方法,从预处理到下游分析,同时列出了涉及该方法应用的最新发表研究清单。我们期望这篇综述将为使用适当的软件工具和数据库在单细胞分辨率下研究染色质可及性提供成功的数据生成和分析方法指南。