Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.
PLoS One. 2013 Jul 26;8(7):e69853. doi: 10.1371/journal.pone.0069853. Print 2013.
DNase I is an enzyme which cuts duplex DNA at a rate that depends strongly upon its chromatin environment. In combination with high-throughput sequencing (HTS) technology, it can be used to infer genome-wide landscapes of open chromatin regions. Using this technology, systematic identification of hundreds of thousands of DNase I hypersensitive sites (DHS) per cell type has been possible, and this in turn has helped to precisely delineate genomic regulatory compartments. However, to date there has been relatively little investigation into possible biases affecting this data.
We report a significant degree of sequence preference spanning sites cut by DNase I in a number of published data sets. The two major protocols in current use each show a different pattern, but for a given protocol the pattern of sequence specificity seems to be quite consistent. The patterns are substantially different from biases seen in other types of HTS data sets, and in some cases the most constrained position lies outside the sequenced fragment, implying that this constraint must relate to the digestion process rather than events occurring during library preparation or sequencing.
DNase I is a sequence-specific enzyme, with a specificity that may depend on experimental conditions. This sequence specificity is not taken into account by existing pipelines for identifying open chromatin regions. Care must be taken when interpreting DNase I results, especially when looking at the precise locations of the reads. Future studies may be able to improve the sensitivity and precision of chromatin state measurement by compensating for sequence bias.
DNase I 是一种在很大程度上依赖其染色质环境的酶,可切割双链 DNA。与高通量测序 (HTS) 技术结合使用,它可用于推断开放染色质区域的全基因组景观。使用这项技术,已经可以对每一种细胞类型进行数以十万计的 DNase I 超敏位点 (DHS) 的系统识别,这反过来又有助于精确划定基因组调控区。然而,迄今为止,对可能影响这些数据的偏差的研究相对较少。
我们报告了在许多已发表的数据集,DNase I 切割位点的序列偏好程度存在显著差异。目前使用的两种主要方案各显示出不同的模式,但对于给定的方案,序列特异性模式似乎非常一致。这些模式与其他类型的 HTS 数据集中的偏差有很大不同,在某些情况下,受限制最大的位置位于测序片段之外,这意味着这种限制必须与消化过程有关,而不是与文库制备或测序过程中发生的事件有关。
DNase I 是一种序列特异性酶,其特异性可能取决于实验条件。现有的识别开放染色质区域的管道并未考虑到这种序列特异性。在解释 DNase I 结果时必须谨慎,尤其是在查看读取的确切位置时。未来的研究可能能够通过补偿序列偏差来提高染色质状态测量的灵敏度和精度。