用于ATAC序列的无监督对比峰检测工具

Unsupervised Contrastive Peak Caller for ATAC-seq.

作者信息

Vu Ha T H, Zhang Yudi, Tuteja Geetu, Dorman Karin

机构信息

Bioinformatics and Computational Biology Program, Iowa State University, Ames IA 50011, USA.

Department of Genetics, Development and Cell Biology, Iowa State University, Ames IA 50011, USA.

出版信息

bioRxiv. 2023 Jan 8:2023.01.07.523108. doi: 10.1101/2023.01.07.523108.

DOI:10.1101/2023.01.07.523108

PMID:36712015

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9881890/

Abstract

The assay for transposase-accessible chromatin with sequencing (ATAC-seq) is a common assay to identify chromatin accessible regions by using a Tn5 transposase that can access, cut, and ligate adapters to DNA fragments for subsequent amplification and sequencing. These sequenced regions are quantified and tested for enrichment in a process referred to as "peak calling". Most unsupervised peak calling methods are based on simple statistical models and suffer from elevated false positive rates. Newly developed supervised deep learning methods can be successful, but they rely on high quality labeled data for training, which can be difficult to obtain. Moreover, though biological replicates are recognized to be important, there are no established approaches for using replicates in the deep learning tools, and the approaches available for traditional methods either cannot be applied to ATAC-seq, where control samples may be unavailable, or are post-hoc and do not capitalize on potentially complex, but reproducible signal in the read enrichment data. Here, we propose a novel peak caller that uses unsupervised contrastive learning to extract shared signals from multiple replicates. Raw coverage data are encoded to obtain low-dimensional embeddings and optimized to minimize a contrastive loss over biological replicates. These embeddings are passed to another contrastive loss for learning and predicting peaks and decoded to denoised data under an autoencoder loss. We compared our Replicative Contrastive Learner (RCL) method with other existing methods on ATAC-seq data, using annotations from ChromHMM genome and transcription factor ChIP-seq as noisy truth. RCL consistently achieved the best performance.

摘要

转座酶可及染色质测序分析（ATAC-seq）是一种常用的分析方法，通过使用Tn5转座酶来识别染色质可及区域，该转座酶能够进入、切割DNA片段并连接接头，以便后续进行扩增和测序。这些测序区域在一个称为“峰检测”的过程中进行定量和富集测试。大多数无监督峰检测方法基于简单的统计模型，存在较高的假阳性率。新开发的有监督深度学习方法可能会成功，但它们依赖高质量的标记数据进行训练，而这些数据可能难以获得。此外，尽管生物学重复被认为很重要，但在深度学习工具中没有既定的方法来使用重复数据，而传统方法可用的方法要么不能应用于ATAC-seq（因为可能没有对照样本），要么是事后的，没有利用读取富集数据中潜在复杂但可重复的信号。在这里，我们提出了一种新颖的峰检测方法，该方法使用无监督对比学习从多个重复数据中提取共享信号。原始覆盖数据被编码以获得低维嵌入，并进行优化以最小化生物学重复之间的对比损失。这些嵌入被传递到另一个对比损失中进行峰的学习和预测，并在自动编码器损失下解码为去噪数据。我们使用来自ChromHMM基因组注释和转录因子ChIP-seq作为有噪声的真值，在ATAC-seq数据上，将我们的复制对比学习器（RCL）方法与其他现有方法进行了比较。RCL始终取得最佳性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/84fe/9881890/9397d63c90a3/nihpp-2023.01.07.523108v1-f0001.jpg

相似文献

Unsupervised Contrastive Peak Caller for ATAC-seq.用于ATAC序列的无监督对比峰检测工具

bioRxiv. 2023 Jan 8:2023.01.07.523108. doi: 10.1101/2023.01.07.523108.

Unsupervised contrastive peak caller for ATAC-seq.无监督对比峰 caller 用于 ATAC-seq。

Genome Res. 2023 Jul;33(7):1133-1144. doi: 10.1101/gr.277677.123. Epub 2023 May 22.

Epigenetic Application of ATAC-Seq Based on Tn5 Transposase Purification Technology.基于 Tn5 转座酶纯化技术的 ATAC-Seq 的表观遗传学应用。

Genet Res (Camb). 2022 Aug 11;2022:8429207. doi: 10.1155/2022/8429207. eCollection 2022.

CoBRA: Containerized Bioinformatics Workflow for Reproducible ChIP/ATAC-seq Analysis.CoBRA：用于可重复 ChIP/ATAC-seq 分析的集装箱化生物信息学工作流程。

Genomics Proteomics Bioinformatics. 2021 Aug;19(4):652-661. doi: 10.1016/j.gpb.2020.11.007. Epub 2021 Jul 18.

ChIP-R: Assembling reproducible sets of ChIP-seq and ATAC-seq peaks from multiple replicates.ChIP-R：从多个重复样本中组装可重复的ChIP-seq和ATAC-seq峰集。

Genomics. 2021 Jul;113(4):1855-1866. doi: 10.1016/j.ygeno.2021.04.026. Epub 2021 Apr 18.

HMMRATAC: a Hidden Markov ModeleR for ATAC-seq.HMMRATAC：一种用于 ATAC-seq 的隐马尔可夫模型。

Nucleic Acids Res. 2019 Sep 19;47(16):e91. doi: 10.1093/nar/gkz533.

Differential ATAC-seq and ChIP-seq peak detection using ROTS.使用ROTS进行差异ATAC-seq和ChIP-seq峰检测。

NAR Genom Bioinform. 2021 Jul 2;3(3):lqab059. doi: 10.1093/nargab/lqab059. eCollection 2021 Sep.

AIAP: A Quality Control and Integrative Analysis Package to Improve ATAC-seq Data Analysis.AIAP：一个用于提高 ATAC-seq 数据分析质量控制和综合分析的工具包。

Genomics Proteomics Bioinformatics. 2021 Aug;19(4):641-651. doi: 10.1016/j.gpb.2020.06.025. Epub 2021 Jul 15.

ATAC-seq Optimization for Cancer Epigenetics Research.ATAC-seq 优化在癌症表观遗传学研究中的应用。

J Vis Exp. 2022 Jun 30(184). doi: 10.3791/64242.

Systematic alteration of ATAC-seq for profiling open chromatin in cryopreserved nuclei preparations from livestock tissues.系统改变 ATAC-seq 以分析来自家畜组织的冷冻核制备物中的开放染色质。

Sci Rep. 2020 Mar 23;10(1):5230. doi: 10.1038/s41598-020-61678-9.

本文引用的文献

rGREAT: an R/bioconductor package for functional enrichment on genomic regions.rGREAT：一个用于基因组区域功能富集的 R/bioconductor 包。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac745.

WhichTF is functionally important in your open chromatin data?在你的开放染色质数据中，WhichTF 具有重要的功能？

PLoS Comput Biol. 2022 Aug 30;18(8):e1010378. doi: 10.1371/journal.pcbi.1010378. eCollection 2022 Aug.

LanceOtron: a deep learning peak caller for genome sequencing experiments.兰斯 Otron：一种用于基因组测序实验的深度学习峰呼叫器。

Bioinformatics. 2022 Sep 15;38(18):4255-4263. doi: 10.1093/bioinformatics/btac525.

Chromatin accessibility profiling by ATAC-seq.染色质可及性分析的 ATAC-seq 技术。

Nat Protoc. 2022 Jun;17(6):1518-1552. doi: 10.1038/s41596-022-00692-9. Epub 2022 Apr 27.

Mapping cis-regulatory elements in the midgestation mouse placenta.绘制中孕期鼠胎盘的顺式调控元件图谱。

Sci Rep. 2021 Nov 16;11(1):22331. doi: 10.1038/s41598-021-01664-x.

Comparison of Short-Read Sequence Aligners Indicates Strengths and Weaknesses for Biologists to Consider.短读长序列比对工具的比较显示了生物学家需要考虑的优势和劣势。

Front Plant Sci. 2021 Apr 16;12:657240. doi: 10.3389/fpls.2021.657240. eCollection 2021.

A flexible ChIP-sequencing simulation toolkit.一个灵活的 ChIP-seq 模拟工具包。

BMC Bioinformatics. 2021 Apr 20;22(1):201. doi: 10.1186/s12859-021-04097-5.

ChIP-R: Assembling reproducible sets of ChIP-seq and ATAC-seq peaks from multiple replicates.ChIP-R：从多个重复样本中组装可重复的ChIP-seq和ATAC-seq峰集。

Genomics. 2021 Jul;113(4):1855-1866. doi: 10.1016/j.ygeno.2021.04.026. Epub 2021 Apr 18.

Twelve years of SAMtools and BCFtools.SAMtools 和 BCFtools 十二年。

Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab008.

Nucleosome Positioning and Spacing: From Mechanism to Function.核小体定位与间隔：从机制到功能。

J Mol Biol. 2021 Mar 19;433(6):166847. doi: 10.1016/j.jmb.2021.166847. Epub 2021 Feb 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于ATAC序列的无监督对比峰检测工具

Unsupervised Contrastive Peak Caller for ATAC-seq.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献