一种在 R 中简化 ATAC-cap-seq 数据分析的工作流程。

A workflow for simplified analysis of ATAC-cap-seq data in R.

机构信息

Sainsbury Laboratory, Norwich Research Park, Norwich, UK, NR4 7UH.

出版信息

Gigascience. 2018 Jul 1;7(7). doi: 10.1093/gigascience/giy080.

DOI:10.1093/gigascience/giy080

PMID:29961827

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6047409/

Abstract

BACKGROUND

Assay for Transposase-Accessible Chromatin (ATAC)-cap-seq is a high-throughput sequencing method that combines ATAC-seq with targeted nucleic acid enrichment of precipitated DNA fragments. There are increased analytical difficulties arising from working with a set of regions of interest that may be small in number and biologically dependent. Common statistical pipelines for RNA sequencing might be assumed to apply but can give misleading results on ATAC-cap-seq data. A tool is needed to allow a nonspecialist user to quickly and easily summarize data and apply sensible and effective normalization and analysis.

RESULTS

We developed atacR to allow a user to easily analyze their ATAC enrichment experiment. It provides comprehensive summary functions and diagnostic plots for studying enriched tag abundance. Application of between-sample normalization is made straightforward. Functions for normalizing based on user-defined control regions, whole library size, and regions selected from the least variable regions in a dataset are provided. Three methods for detecting differential abundance of tags from enriched methods are provided, including bootstrap t, Bayes factor, and a wrapped version of the standard exact test in the edgeR package. We compared the precision, recall, and F-score of each detection method on resampled datasets at varying replicate, significance threshold, and genes changed and found that the Bayes factor method had the greatest overall detection power, though edgeR was slightly stronger in simulations with lower numbers of genes changed.

CONCLUSIONS

Our package allows a nonspecialist user to easily and effectively apply methods appropriate to the analysis of ATAC-cap-seq in a reproducible manner. The package is implemented in pure R and is fully interoperable with common workflows in Bioconductor.

摘要

背景

转座酶可及染色质（ATAC）-cap-seq 是一种高通量测序方法，它将 ATAC-seq 与沉淀 DNA 片段的靶向核酸富集相结合。由于处理的是一组数量可能较少且依赖于生物学的感兴趣区域，因此会出现分析上的困难。可能假设 RNA 测序的常见统计管道适用，但在 ATAC-cap-seq 数据上可能会产生误导性结果。需要一种工具来允许非专业用户快速轻松地总结数据，并应用合理有效的归一化和分析。

结果

我们开发了 atacR，使非专业用户能够轻松分析他们的 ATAC 富集实验。它提供了全面的摘要功能和诊断图，用于研究富集标签的丰度。应用于样本间归一化的方法很简单。提供了基于用户定义的对照区域、整个文库大小以及从数据集最小变异性区域中选择的区域对标签进行归一化的功能。提供了三种从富集方法中检测标签丰度差异的方法，包括自举 t、贝叶斯因子和 edgeR 包中标准精确检验的包装版本。我们比较了在不同重复、显著阈值和基因变化的情况下，每种检测方法在重采样数据上的精度、召回率和 F 分数，发现贝叶斯因子方法的整体检测能力最强，尽管在基因变化较少的模拟中，edgeR 略强。