Center for Genomics Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan.
Genes Cells. 2024 Jan;29(1):5-16. doi: 10.1111/gtc.13082. Epub 2023 Nov 21.
Assay for Transposase-Accessible Chromatin using high-throughput sequencing (ATAC-seq) is the popular technique using next-generation sequencing to measure chromatin accessibility and identify open chromatin regions. While read alignment shape information of next-generation sequencing data with intensity information has been used in various bioinformatics methods, few studies have focused on pure shape information alone. In this study, we investigated what types of ATAC-seq read alignment shapes are observed for the promoter region and whether the pure shape information was related or unrelated to other gene features. We introduced a novel concept and pipeline for handling the pure shape information of NGS data as probability distributions and quantifying their dissimilarities by information theory. Based on this concept, we demonstrate that the pure shape information of ATAC-seq data is correlated with chromatin openness and some gene characteristics. On the other hand, it is suggested that the pure information of ATAC-seq read alignment shape is unlikely to contain additional information to explain differences in RNA expression. Our study suggests that viewing the read alignment shape of NGS data as probability distributions enables us to capture the characteristics of the genome-wide landscape of such data in a non-parametric manner.
使用高通量测序(ATAC-seq)进行转座酶可及染色质分析是一种流行的技术,它使用下一代测序来测量染色质可及性并识别开放染色质区域。虽然下一代测序数据的读取比对形状信息与强度信息已在各种生物信息学方法中得到应用,但很少有研究仅关注纯形状信息。在这项研究中,我们研究了在启动子区域观察到的 ATAC-seq 读取比对形状的类型,以及纯形状信息是否与其他基因特征相关或不相关。我们引入了一个新概念和处理 NGS 数据纯形状信息的管道,将其作为概率分布,并通过信息论来量化它们的差异。基于这个概念,我们证明了 ATAC-seq 数据的纯形状信息与染色质开放性和一些基因特征相关。另一方面,我们认为 ATAC-seq 读取比对形状的纯信息不太可能包含额外的信息来解释 RNA 表达的差异。我们的研究表明,将 NGS 数据的读取比对形状视为概率分布,使我们能够以非参数方式捕获此类数据的全基因组景观特征。