Suppr超能文献

Perm-seq:通过先验增强读段映射在基因组的节段重复和高度重复区域中绘制蛋白质-DNA相互作用图谱

Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping.

作者信息

Zeng Xin, Li Bo, Welch Rene, Rojo Constanza, Zheng Ye, Dewey Colin N, Keleş Sündüz

机构信息

Department of Statistics, University of Wisconsin, Madison, Wisconsin, United States of America.

California Institute for Quantitative Biosciences, University of California, Berkeley, California, United States of America.

出版信息

PLoS Comput Biol. 2015 Oct 20;11(10):e1004491. doi: 10.1371/journal.pcbi.1004491. eCollection 2015 Oct.

Abstract

Segmental duplications and other highly repetitive regions of genomes contribute significantly to cells' regulatory programs. Advancements in next generation sequencing enabled genome-wide profiling of protein-DNA interactions by chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq). However, interactions in highly repetitive regions of genomes have proven difficult to map since short reads of 50-100 base pairs (bps) from these regions map to multiple locations in reference genomes. Standard analytical methods discard such multi-mapping reads and the few that can accommodate them are prone to large false positive and negative rates. We developed Perm-seq, a prior-enhanced read allocation method for ChIP-seq experiments, that can allocate multi-mapping reads in highly repetitive regions of the genomes with high accuracy. We comprehensively evaluated Perm-seq, and found that our prior-enhanced approach significantly improves multi-read allocation accuracy over approaches that do not utilize additional data types. The statistical formalism underlying our approach facilitates supervising of multi-read allocation with a variety of data sources including histone ChIP-seq. We applied Perm-seq to 64 ENCODE ChIP-seq datasets from GM12878 and K562 cells and identified many novel protein-DNA interactions in segmental duplication regions. Our analysis reveals that although the protein-DNA interactions sites are evolutionarily less conserved in repetitive regions, they share the overall sequence characteristics of the protein-DNA interactions in non-repetitive regions.

摘要

基因组中的片段重复和其他高度重复区域对细胞的调控程序有重大贡献。下一代测序技术的进步使得通过染色质免疫沉淀后进行高通量测序(ChIP-seq)来对全基因组蛋白质-DNA相互作用进行分析成为可能。然而,基因组高度重复区域中的相互作用已被证明难以绘制图谱,因为来自这些区域的50-100个碱基对(bps)的短读段会映射到参考基因组中的多个位置。标准分析方法会丢弃这些多映射读段,而少数能够处理它们的方法容易出现高假阳性和假阴性率。我们开发了Perm-seq,一种用于ChIP-seq实验的先验增强读段分配方法,它可以在基因组的高度重复区域中高精度地分配多映射读段。我们全面评估了Perm-seq,发现我们的先验增强方法比不利用其他数据类型的方法显著提高了多读段分配的准确性。我们方法背后的统计形式有助于利用包括组蛋白ChIP-seq在内的各种数据源对多读段分配进行监督。我们将Perm-seq应用于来自GM12878和K-562细胞的64个ENCODE ChIP-seq数据集,并在片段重复区域中鉴定出许多新的蛋白质-DNA相互作用。我们的分析表明,尽管蛋白质-DNA相互作用位点在重复区域的进化上保守性较低,但它们具有非重复区域中蛋白质-DNA相互作用的整体序列特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74b3/4618727/e85003a9f342/pcbi.1004491.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验