利用 ChIP-Seq 数据的多读分析技术，在基因组的高度重复区域中发现转录因子结合位点。

Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.

机构信息

Department of Statistics, University of Wisconsin, Madison, Wisconsin, United States of America.

出版信息

PLoS Comput Biol. 2011 Jul;7(7):e1002111. doi: 10.1371/journal.pcbi.1002111. Epub 2011 Jul 14.

DOI:10.1371/journal.pcbi.1002111

PMID:21779159

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3136429/

Abstract

Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.

摘要

染色质免疫沉淀结合高通量测序（ChIP-seq）正在迅速取代染色质免疫沉淀联合全基因组平铺阵列分析（ChIP-chip），成为绘制转录因子结合位点和染色质修饰图谱的首选方法。分析 ChIP-seq 数据的最新方法依赖于仅使用唯一映射到相关参考基因组的读取（uni-reads）。这可能导致高达 30%的可对齐读取被遗漏。我们描述了一种利用可映射到参考基因组多个位置的读取（multi-reads）的通用方法。我们的方法基于使用加权对齐方案为 multi-reads 分配分数计数。使用人类 STAT1 和小鼠 GATA1 ChIP-seq 数据集，我们说明整合 multi-reads 可显著增加测序深度，可检测到无法用 uni-reads 检测到的新峰，并且可提高可映射区域峰的检测。我们通过计算实验研究了仅通过利用 multi-reads 检测到的峰的各种全基因组特征。总体而言，multi-read 分析得到的峰与用 uni-reads 鉴定的峰具有相似的特征，除了大多数峰位于片段重复区。我们通过独立的定量实时 ChIP 分析进一步验证了一些 GATA1 multi-read 仅有的峰，并鉴定了 GATA1 的新靶基因。这些计算和实验结果表明，multi-reads 对于用 ChIP-seq 实验研究基因组高度重复区的转录因子结合至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/107f/3136429/f0c288281f94/pcbi.1002111.g001.jpg

相似文献

Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.利用 ChIP-Seq 数据的多读分析技术，在基因组的高度重复区域中发现转录因子结合位点。

PLoS Comput Biol. 2011 Jul;7(7):e1002111. doi: 10.1371/journal.pcbi.1002111. Epub 2011 Jul 14.

Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping.Perm-seq：通过先验增强读段映射在基因组的节段重复和高度重复区域中绘制蛋白质-DNA相互作用图谱

PLoS Comput Biol. 2015 Oct 20;11(10):e1004491. doi: 10.1371/journal.pcbi.1004491. eCollection 2015 Oct.

CNV-guided multi-read allocation for ChIP-seq.基于 CNV 的 ChIP-seq 多读取分配

Bioinformatics. 2014 Oct 15;30(20):2860-7. doi: 10.1093/bioinformatics/btu402. Epub 2014 Jun 24.

Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data.从ChIP-Seq数据中进行全基因组范围内体内蛋白质-DNA结合位点的鉴定。

Nucleic Acids Res. 2008 Sep;36(16):5221-31. doi: 10.1093/nar/gkn488. Epub 2008 Aug 6.

Important biological information uncovered in previously unaligned reads from chromatin immunoprecipitation experiments (ChIP-Seq).在染色质免疫沉淀实验（ChIP-Seq）中，从之前未比对的 reads 中发现的重要生物学信息。

Sci Rep. 2015 Mar 2;5:8635. doi: 10.1038/srep08635.

AREM: aligning short reads from ChIP-sequencing by expectation maximization.AREM：通过期望最大化算法对ChIP测序的短读段进行比对

J Comput Biol. 2011 Nov;18(11):1495-505. doi: 10.1089/cmb.2011.0185. Epub 2011 Oct 28.

An integrated pipeline for the genome-wide analysis of transcription factor binding sites from ChIP-Seq.从 ChIP-Seq 进行全基因组转录因子结合位点分析的集成管道。

PLoS One. 2011 Feb 16;6(2):e16432. doi: 10.1371/journal.pone.0016432.

Cell-type specificity of ChIP-predicted transcription factor binding sites.ChIP 预测转录因子结合位点的细胞类型特异性。

BMC Genomics. 2012 Aug 3;13:372. doi: 10.1186/1471-2164-13-372.

Using CisGenome to analyze ChIP-chip and ChIP-seq data.使用CisGenome分析染色质免疫沉淀芯片（ChIP-chip）和染色质免疫沉淀测序（ChIP-seq）数据。

Curr Protoc Bioinformatics. 2011 Mar;Chapter 2:Unit2.13. doi: 10.1002/0471250953.bi0213s33.

Accurate allocation of multimapped reads enables regulatory element analysis at repeats.准确分配多映射reads 可实现重复元件调控元件分析。

Genome Res. 2024 Jul 23;34(6):937-951. doi: 10.1101/gr.278638.123.

引用本文的文献

The epigenetics effects of transposable elements are genomic context dependent and not restricted to gene silencing in Drosophila.转座元件的表观遗传效应依赖于基因组背景，且不限于果蝇中的基因沉默。

Genome Biol. 2025 Aug 18;26(1):251. doi: 10.1186/s13059-025-03705-4.

Synergistic roles of NFATc1 and c-Jun in immunomodulation.NFATc1和c-Jun在免疫调节中的协同作用。

Biochem Biophys Rep. 2025 Jul 7;43:102137. doi: 10.1016/j.bbrep.2025.102137. eCollection 2025 Sep.

ChIP-seq Data Processing and Relative and Quantitative Signal Normalization for .染色质免疫沉淀测序（ChIP-seq）数据处理以及相关定量信号归一化用于…… （你提供的原文不完整，翻译到这里会感觉不太通顺，可补充完整原文以便得到更准确的译文）

Bio Protoc. 2025 May 5;15(9):e5299. doi: 10.21769/BioProtoc.5299.

Complex determinants of R-loop formation at transposable elements and major DNA satellites.转座元件和主要DNA卫星上R环形成的复杂决定因素。

Genetics. 2025 Apr 17;229(4). doi: 10.1093/genetics/iyaf035.

Identification of transcription factor co-binding patterns with non-negative matrix factorization.利用非负矩阵分解鉴定转录因子共结合模式。

Nucleic Acids Res. 2024 Oct 14;52(18):e85. doi: 10.1093/nar/gkae743.

Accurate allocation of multimapped reads enables regulatory element analysis at repeats.准确分配多映射reads 可实现重复元件调控元件分析。

Genome Res. 2024 Jul 23;34(6):937-951. doi: 10.1101/gr.278638.123.

Bulk Segregant Analysis Sequencing and RNA-Seq Analyses Reveal Candidate Genes Associated with Sepal Color Phenotype of Eggplant ( L.).混合分组分析法测序和RNA测序分析揭示了与茄子（L.）萼片颜色表型相关的候选基因。

Plants (Basel). 2024 May 16;13(10):1385. doi: 10.3390/plants13101385.

Disregarding multimappers leads to biases in the functional assessment of NGS data.忽略多重比对会导致对 NGS 数据的功能评估产生偏差。

BMC Genomics. 2024 May 8;25(1):455. doi: 10.1186/s12864-024-10344-9.

Identification of transcription factor high accumulation DNA zones.转录因子高积累 DNA 区的鉴定。

BMC Bioinformatics. 2023 Oct 20;24(1):395. doi: 10.1186/s12859-023-05528-1.

Dot1l cooperates with Npm1 to repress endogenous retrovirus MERVL in embryonic stem cells.Dot1l 与 Npm1 合作抑制胚胎干细胞中的内源性逆转录病毒 MERVL。

Nucleic Acids Res. 2023 Sep 22;51(17):8970-8986. doi: 10.1093/nar/gkad640.

本文引用的文献

A Statistical Framework for the Analysis of ChIP-Seq Data.用于ChIP-Seq数据分析的统计框架

J Am Stat Assoc. 2011;106(495):891-903. doi: 10.1198/jasa.2011.ap09706. Epub 2012 Jan 24.

Estimation of alternative splicing isoform frequencies from RNA-Seq data.从RNA测序数据估计可变剪接异构体频率。

Algorithms Mol Biol. 2011 Apr 19;6(1):9. doi: 10.1186/1748-7188-6-9.

Accurate estimation of expression levels of homologous genes in RNA-seq experiments.RNA测序实验中同源基因表达水平的准确估计。

J Comput Biol. 2011 Mar;18(3):459-68. doi: 10.1089/cmb.2010.0259.

A Gibbs sampling strategy applied to the mapping of ambiguous short-sequence tags.一种用于模糊短序列标签映射的 Gibbs 抽样策略。

Bioinformatics. 2010 Oct 15;26(20):2501-8. doi: 10.1093/bioinformatics/btq460. Epub 2010 Sep 24.

HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data.HPeak：一种基于隐马尔可夫模型的算法，用于定义 ChIP-Seq 数据中的读取富集区域。

BMC Bioinformatics. 2010 Jul 2;11:369. doi: 10.1186/1471-2105-11-369.

Estimating enrichment of repetitive elements from high-throughput sequence data.从高通量测序数据中估计重复元件的丰度。

Genome Biol. 2010;11(6):R69. doi: 10.1186/gb-2010-11-6-r69. Epub 2010 Jun 28.

RNA-Seq gene expression estimation with read mapping uncertainty.基于读段比对不确定性的 RNA-Seq 基因表达估计。

Bioinformatics. 2010 Feb 15;26(4):493-500. doi: 10.1093/bioinformatics/btp692. Epub 2009 Dec 18.

Discovering hematopoietic mechanisms through genome-wide analysis of GATA factor chromatin occupancy.通过对GATA因子染色质占据情况进行全基因组分析来发现造血机制。

Mol Cell. 2009 Nov 25;36(4):667-81. doi: 10.1016/j.molcel.2009.11.001.

BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources.BioGPS：一个可扩展和可定制的门户，用于查询和组织基因注释资源。

Genome Biol. 2009;10(11):R130. doi: 10.1186/gb-2009-10-11-r130. Epub 2009 Nov 17.

The UCSC Genome Browser database: update 2010.UCSC 基因组浏览器数据库：2010 年更新

Nucleic Acids Res. 2010 Jan;38(Database issue):D613-9. doi: 10.1093/nar/gkp939. Epub 2009 Nov 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用 ChIP-Seq 数据的多读分析技术，在基因组的高度重复区域中发现转录因子结合位点。

Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献