Perm-seq：通过先验增强读段映射在基因组的节段重复和高度重复区域中绘制蛋白质-DNA相互作用图谱

Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping.

作者信息

Zeng Xin, Li Bo, Welch Rene, Rojo Constanza, Zheng Ye, Dewey Colin N, Keleş Sündüz

机构信息

Department of Statistics, University of Wisconsin, Madison, Wisconsin, United States of America.

California Institute for Quantitative Biosciences, University of California, Berkeley, California, United States of America.

出版信息

PLoS Comput Biol. 2015 Oct 20;11(10):e1004491. doi: 10.1371/journal.pcbi.1004491. eCollection 2015 Oct.

DOI:10.1371/journal.pcbi.1004491

PMID:26484757

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4618727/

Abstract

Segmental duplications and other highly repetitive regions of genomes contribute significantly to cells' regulatory programs. Advancements in next generation sequencing enabled genome-wide profiling of protein-DNA interactions by chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq). However, interactions in highly repetitive regions of genomes have proven difficult to map since short reads of 50-100 base pairs (bps) from these regions map to multiple locations in reference genomes. Standard analytical methods discard such multi-mapping reads and the few that can accommodate them are prone to large false positive and negative rates. We developed Perm-seq, a prior-enhanced read allocation method for ChIP-seq experiments, that can allocate multi-mapping reads in highly repetitive regions of the genomes with high accuracy. We comprehensively evaluated Perm-seq, and found that our prior-enhanced approach significantly improves multi-read allocation accuracy over approaches that do not utilize additional data types. The statistical formalism underlying our approach facilitates supervising of multi-read allocation with a variety of data sources including histone ChIP-seq. We applied Perm-seq to 64 ENCODE ChIP-seq datasets from GM12878 and K562 cells and identified many novel protein-DNA interactions in segmental duplication regions. Our analysis reveals that although the protein-DNA interactions sites are evolutionarily less conserved in repetitive regions, they share the overall sequence characteristics of the protein-DNA interactions in non-repetitive regions.

摘要

基因组中的片段重复和其他高度重复区域对细胞的调控程序有重大贡献。下一代测序技术的进步使得通过染色质免疫沉淀后进行高通量测序（ChIP-seq）来对全基因组蛋白质-DNA相互作用进行分析成为可能。然而，基因组高度重复区域中的相互作用已被证明难以绘制图谱，因为来自这些区域的50-100个碱基对（bps）的短读段会映射到参考基因组中的多个位置。标准分析方法会丢弃这些多映射读段，而少数能够处理它们的方法容易出现高假阳性和假阴性率。我们开发了Perm-seq，一种用于ChIP-seq实验的先验增强读段分配方法，它可以在基因组的高度重复区域中高精度地分配多映射读段。我们全面评估了Perm-seq，发现我们的先验增强方法比不利用其他数据类型的方法显著提高了多读段分配的准确性。我们方法背后的统计形式有助于利用包括组蛋白ChIP-seq在内的各种数据源对多读段分配进行监督。我们将Perm-seq应用于来自GM12878和K-562细胞的64个ENCODE ChIP-seq数据集，并在片段重复区域中鉴定出许多新的蛋白质-DNA相互作用。我们的分析表明，尽管蛋白质-DNA相互作用位点在重复区域的进化上保守性较低，但它们具有非重复区域中蛋白质-DNA相互作用的整体序列特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74b3/4618727/e85003a9f342/pcbi.1004491.g001.jpg

相似文献

Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping.

PLoS Comput Biol. 2015 Oct 20;11(10):e1004491. doi: 10.1371/journal.pcbi.1004491. eCollection 2015 Oct.

Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.

PLoS Comput Biol. 2011 Jul;7(7):e1002111. doi: 10.1371/journal.pcbi.1002111. Epub 2011 Jul 14.

CNV-guided multi-read allocation for ChIP-seq.

Bioinformatics. 2014 Oct 15;30(20):2860-7. doi: 10.1093/bioinformatics/btu402. Epub 2014 Jun 24.

Sensitive alignment using paralogous sequence variants improves long-read mapping and variant calling in segmental duplications.

Nucleic Acids Res. 2020 Nov 4;48(19):e114. doi: 10.1093/nar/gkaa829.

Accurate allocation of multimapped reads enables regulatory element analysis at repeats.

Genome Res. 2024 Jul 23;34(6):937-951. doi: 10.1101/gr.278638.123.

Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data.

Nucleic Acids Res. 2008 Sep;36(16):5221-31. doi: 10.1093/nar/gkn488. Epub 2008 Aug 6.

Sequence deeper without sequencing more: Bayesian resolution of ambiguously mapped reads.

PLoS Comput Biol. 2021 Apr 19;17(4):e1008926. doi: 10.1371/journal.pcbi.1008926. eCollection 2021 Apr.

Computational analysis of protein-DNA interactions from ChIP-seq data.

Methods Mol Biol. 2012;786:263-73. doi: 10.1007/978-1-61779-292-2_16.

Genomic feature extraction and comparison based on global alignment of ChIP-sequencing data.

Bioengineered. 2017 May 4;8(3):248-255. doi: 10.1080/21655979.2016.1226714. Epub 2016 Sep 30.

Nonconsensus Protein Binding to Repetitive DNA Sequence Elements Significantly Affects Eukaryotic Genomes.

PLoS Comput Biol. 2015 Aug 18;11(8):e1004429. doi: 10.1371/journal.pcbi.1004429. eCollection 2015 Aug.

引用本文的文献

The epigenetics effects of transposable elements are genomic context dependent and not restricted to gene silencing in Drosophila.

Genome Biol. 2025 Aug 18;26(1):251. doi: 10.1186/s13059-025-03705-4.

Accurate allocation of multimapped reads enables regulatory element analysis at repeats.

Genome Res. 2024 Jul 23;34(6):937-951. doi: 10.1101/gr.278638.123.

Taming transposable elements in livestock and poultry: a review of their roles and applications.

Genet Sel Evol. 2023 Jul 21;55(1):50. doi: 10.1186/s12711-023-00821-2.

Segmentation and genome annotation algorithms for identifying chromatin state and other genomic patterns.

PLoS Comput Biol. 2021 Oct 14;17(10):e1009423. doi: 10.1371/journal.pcbi.1009423. eCollection 2021 Oct.

Sequence deeper without sequencing more: Bayesian resolution of ambiguously mapped reads.

PLoS Comput Biol. 2021 Apr 19;17(4):e1008926. doi: 10.1371/journal.pcbi.1008926. eCollection 2021 Apr.

Mobile genomics: tools and techniques for tackling transposons.

Philos Trans R Soc Lond B Biol Sci. 2020 Mar 30;375(1795):20190345. doi: 10.1098/rstb.2019.0345. Epub 2020 Feb 10.

Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2.

Genes (Basel). 2020 Jan 29;11(2):141. doi: 10.3390/genes11020141.

Generative modeling of multi-mapping reads with mHi-C advances analysis of Hi-C studies.

Elife. 2019 Jan 31;8:e38070. doi: 10.7554/eLife.38070.

Integrative analysis with ChIP-seq advances the limits of transcript quantification from RNA-seq.

Genome Res. 2016 Aug;26(8):1124-33. doi: 10.1101/gr.199174.115. Epub 2016 Jul 12.

本文引用的文献

LOcating non-unique matched tags (LONUT) to improve the detection of the enriched regions for ChIP-seq data.

PLoS One. 2013 Jun 25;8(6):e67788. doi: 10.1371/journal.pone.0067788. Print 2013.

Autocrine CCL3 and CCL4 induced by the oncoprotein LMP1 promote Epstein-Barr virus-triggered B cell proliferation.

J Virol. 2013 Aug;87(16):9041-52. doi: 10.1128/JVI.00541-13. Epub 2013 Jun 12.

Benchmarking short sequence mapping tools.

BMC Bioinformatics. 2013 Jun 7;14:184. doi: 10.1186/1471-2105-14-184.

DNA hypomethylation within specific transposable element families associates with tissue-specific enhancer landscape.

Nat Genet. 2013 Jul;45(7):836-41. doi: 10.1038/ng.2649. Epub 2013 May 26.

Streaming fragment assignment for real-time analysis of sequencing experiments.

Nat Methods. 2013 Jan;10(1):71-3. doi: 10.1038/nmeth.2251. Epub 2012 Nov 18.

H2A.Z landscapes and dual modifications in pluripotent and multipotent stem cells underlie complex genome regulatory functions.

Genome Biol. 2012 Oct 3;13(10):R85. doi: 10.1186/gb-2012-13-10-r85.

ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia.

Genome Res. 2012 Sep;22(9):1813-31. doi: 10.1101/gr.136184.111.

Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors.

Genome Res. 2012 Sep;22(9):1798-812. doi: 10.1101/gr.139105.112.

Sequence and chromatin determinants of cell-type-specific transcription factor binding.

Genome Res. 2012 Sep;22(9):1723-34. doi: 10.1101/gr.127712.111.

The accessible chromatin landscape of the human genome.

Nature. 2012 Sep 6;489(7414):75-82. doi: 10.1038/nature11232.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Perm-seq：通过先验增强读段映射在基因组的节段重复和高度重复区域中绘制蛋白质-DNA相互作用图谱

Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献