Vega Vinsensius B, Cheung Edwin, Palanisamy Nallasivam, Sung Wing-Kin
Computational and Mathematical Biology Group, Genome Institute of Singapore, Singapore, Singapore.
PLoS One. 2009;4(4):e5241. doi: 10.1371/journal.pone.0005241. Epub 2009 Apr 15.
The growth of sequencing-based Chromatin Immuno-Precipitation studies call for a more in-depth understanding of the nature of the technology and of the resultant data to reduce false positives and false negatives. Control libraries are typically constructed to complement such studies in order to mitigate the effect of systematic biases that might be present in the data. In this study, we explored multiple control libraries to obtain better understanding of what they truly represent.
First, we analyzed the genome-wide profiles of various sequencing-based libraries at a low resolution of 1 Mbp, and compared them with each other as well as against aCGH data. We found that copy number plays a major influence in both ChIP-enriched as well as control libraries. Following that, we inspected the repeat regions to assess the extent of mapping bias. Next, significantly tag-rich 5 kbp regions were identified and they were associated with various genomic landmarks. For instance, we discovered that gene boundaries were surprisingly enriched with sequenced tags. Further, profiles between different cell types were noticeably distinct although the cell types were somewhat related and similar.
We found that control libraries bear traces of systematic biases. The biases can be attributed to genomic copy number, inherent sequencing bias, plausible mapping ambiguity, and cell-type specific chromatin structure. Our results suggest careful analysis of control libraries can reveal promising biological insights.
基于测序的染色质免疫沉淀研究的发展,需要更深入地了解该技术的本质以及所得数据,以减少假阳性和假阴性。通常构建对照文库以补充此类研究,以减轻数据中可能存在的系统偏差的影响。在本研究中,我们探索了多个对照文库,以更好地了解它们真正代表的内容。
首先,我们以1兆碱基对的低分辨率分析了各种基于测序的文库的全基因组图谱,并将它们相互比较以及与aCGH数据进行比较。我们发现拷贝数在ChIP富集文库和对照文库中都有主要影响。在此之后,我们检查了重复区域以评估映射偏差的程度。接下来,识别出显著富含标签的5千碱基对区域,并将它们与各种基因组标记相关联。例如,我们发现基因边界惊人地富含测序标签。此外,尽管细胞类型有些相关且相似,但不同细胞类型之间的图谱明显不同。
我们发现对照文库带有系统偏差的痕迹。这些偏差可归因于基因组拷贝数、固有的测序偏差、可能的映射模糊性以及细胞类型特异性染色质结构。我们的结果表明,对对照文库进行仔细分析可以揭示有前景的生物学见解。