Suppr超能文献

Allo:多映射读段的准确分配有助于在重复序列处进行调控元件分析。

Allo: Accurate allocation of multi-mapped reads enables regulatory element analysis at repeats.

作者信息

Morrissey Alexis, Shi Jeffrey, James Daniela Q, Mahony Shaun

机构信息

Center for Eukaryotic Gene Regulation, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA.

出版信息

bioRxiv. 2023 Sep 15:2023.09.12.556916. doi: 10.1101/2023.09.12.556916.

Abstract

Transposable elements (TEs) and other repetitive regions have been shown to contain gene regulatory elements, including transcription factor binding sites. Unfortunately, regulatory elements harbored by repeats have proven difficult to characterize using short-read sequencing assays such as ChIP-seq or ATAC-seq. Most regulatory genomics analysis pipelines discard "multi-mapped" reads that align equally well to multiple genomic locations. Since multi-mapped reads arise predominantly from repeats, current analysis pipelines fail to detect a substantial portion of regulatory events that occur in repetitive regions. To address this shortcoming, we developed Allo, a new approach to allocate multi-mapped reads in an efficient, accurate, and user-friendly manner. Allo combines probabilistic mapping of multi-mapped reads with a convolutional neural network that recognizes the read distribution features of potential peaks, offering enhanced accuracy in multi-mapping read assignment. Allo also provides read-level output in the form of a corrected alignment file, making it compatible with existing regulatory genomics analysis pipelines and downstream peak-finders. In a demonstration application on CTCF ChIP-seq data, we show that Allo results in the discovery of thousands of new CTCF peaks. Many of these peaks contain the expected cognate motif and/or serve as TAD boundaries. We additionally apply Allo to a diverse collection of ENCODE ChIP-seq datasets, resulting in multiple previously unidentified interactions between transcription factors and repetitive element families. Finally, we show that Allo may be particularly effective in identifying ChIP-seq peaks in younger TEs, which hold evolutionary significance due to their emergence during human evolution from primates.

摘要

转座元件(TEs)和其他重复区域已被证明含有基因调控元件,包括转录因子结合位点。不幸的是,事实证明,使用ChIP-seq或ATAC-seq等短读长测序分析方法来表征重复序列所包含的调控元件非常困难。大多数调控基因组学分析流程会丢弃那些能同样良好地比对到多个基因组位置的“多比对” reads。由于多比对reads主要来自重复序列,当前的分析流程无法检测到重复区域中发生的很大一部分调控事件。为了解决这一缺点,我们开发了Allo,这是一种以高效、准确且用户友好的方式分配多比对reads的新方法。Allo将多比对reads的概率映射与一个识别潜在峰的reads分布特征的卷积神经网络相结合,在多比对reads分配方面提供了更高的准确性。Allo还以校正后的比对文件的形式提供reads水平的输出,使其与现有的调控基因组学分析流程和下游峰查找器兼容。在对CTCF ChIP-seq数据的演示应用中,我们表明Allo能够发现数千个新的CTCF峰。其中许多峰包含预期的同源基序和/或充当TAD边界。我们还将Allo应用于各种ENCODE ChIP-seq数据集,从而发现了转录因子与重复元件家族之间多个先前未识别的相互作用。最后,我们表明Allo在识别较年轻TEs中的ChIP-seq峰方面可能特别有效,这些TEs由于在人类从灵长类动物进化过程中出现而具有进化意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2df/10515862/48ed20be1eb8/nihpp-2023.09.12.556916v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验