Suppr超能文献

从ChIP-Seq数据中进行全基因组范围内体内蛋白质-DNA结合位点的鉴定。

Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data.

作者信息

Jothi Raja, Cuddapah Suresh, Barski Artem, Cui Kairong, Zhao Keji

机构信息

Laboratory of Molecular Immunology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20894, USA.

出版信息

Nucleic Acids Res. 2008 Sep;36(16):5221-31. doi: 10.1093/nar/gkn488. Epub 2008 Aug 6.

Abstract

ChIP-Seq, which combines chromatin immunoprecipitation (ChIP) with ultra high-throughput massively parallel sequencing, is increasingly being used for mapping protein-DNA interactions in-vivo on a genome scale. Typically, short sequence reads from ChIP-Seq are mapped to a reference genome for further analysis. Although genomic regions enriched with mapped reads could be inferred as approximate binding regions, short read lengths (approximately 25-50 nt) pose challenges for determining the exact binding sites within these regions. Here, we present SISSRs (Site Identification from Short Sequence Reads), a novel algorithm for precise identification of binding sites from short reads generated from ChIP-Seq experiments. The sensitivity and specificity of SISSRs are demonstrated by applying it on ChIP-Seq data for three widely studied and well-characterized human transcription factors: CTCF (CCCTC-binding factor), NRSF (neuron-restrictive silencer factor) and STAT1 (signal transducer and activator of transcription protein 1). We identified 26 814, 5813 and 73 956 binding sites for CTCF, NRSF and STAT1 proteins, respectively, which is 32, 299 and 78% more than that inferred previously for the respective proteins. Motif analysis revealed that an overwhelming majority of the identified binding sites contained the previously established consensus binding sequence for the respective proteins, thus attesting for SISSRs' accuracy. SISSRs' sensitivity and precision facilitated further analyses of ChIP-Seq data revealing interesting insights, which we believe will serve as guidance for designing ChIP-Seq experiments to map in vivo protein-DNA interactions. We also show that tag densities at the binding sites are a good indicator of protein-DNA binding affinity, which could be used to distinguish and characterize strong and weak binding sites. Using tag density as an indicator of DNA-binding affinity, we have identified core residues within the NRSF and CTCF binding sites that are critical for a stronger DNA binding.

摘要

染色质免疫沉淀测序(ChIP-Seq)将染色质免疫沉淀(ChIP)与超高通量大规模平行测序相结合,越来越多地用于在全基因组范围内绘制体内蛋白质-DNA相互作用图谱。通常,来自ChIP-Seq的短序列读数会映射到参考基因组进行进一步分析。尽管富含映射读数的基因组区域可推断为近似结合区域,但短读长(约25-50个核苷酸)对确定这些区域内的确切结合位点构成挑战。在此,我们提出了SISSRs(从短序列读数中识别位点),这是一种从ChIP-Seq实验产生的短读数中精确识别结合位点的新算法。通过将SISSRs应用于三种广泛研究且特征明确的人类转录因子(CTCF(CCCTC结合因子)、NRSF(神经元限制性沉默因子)和STAT1(信号转导和转录激活蛋白1))的ChIP-Seq数据,证明了其敏感性和特异性。我们分别为CTCF、NRSF和STAT1蛋白鉴定了26814个、5813个和73956个结合位点,分别比之前推断的相应蛋白的结合位点多32%、299%和78%。基序分析表明,绝大多数鉴定出的结合位点包含先前确定的相应蛋白的共有结合序列,从而证明了SISSRs的准确性。SISSRs的敏感性和精确性有助于对ChIP-Seq数据进行进一步分析,揭示有趣的见解,我们相信这将为设计用于绘制体内蛋白质-DNA相互作用的ChIP-Seq实验提供指导。我们还表明,结合位点处的标签密度是蛋白质-DNA结合亲和力的良好指标,可用于区分和表征强结合位点和弱结合位点。以标签密度作为DNA结合亲和力的指标,我们在NRSF和CTCF结合位点内鉴定出了对更强DNA结合至关重要的核心残基。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aba4/2532738/5fdcd42075d4/gkn488f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验