MIT Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA.
Bioinformatics. 2010 Dec 15;26(24):3028-34. doi: 10.1093/bioinformatics/btq590. Epub 2010 Oct 21.
Clusters of protein-DNA interaction events involving the same transcription factor are known to act as key components of invertebrate and mammalian promoters and enhancers. However, detecting closely spaced homotypic events from ChIP-Seq data is challenging because random variation in the ChIP fragmentation process obscures event locations.
The Genome Positioning System (GPS) can predict protein-DNA interaction events at high spatial resolution from ChIP-Seq data, while retaining the ability to resolve closely spaced events that appear as a single cluster of reads. GPS models observed reads using a complexity penalized mixture model and efficiently predicts event locations with a segmented EM algorithm. An optional mode permits GPS to align common events across distinct experiments. GPS detects more joint events in synthetic and actual ChIP-Seq data and has superior spatial resolution when compared with other methods. In addition, the specificity and sensitivity of GPS are superior to or comparable with other methods.
已知涉及相同转录因子的蛋白质-DNA 相互作用簇是作为无脊椎动物和哺乳动物启动子和增强子的关键组成部分发挥作用的。然而,从 ChIP-Seq 数据中检测紧密间隔的同型事件具有挑战性,因为 ChIP 片段化过程中的随机变化掩盖了事件位置。
基因组定位系统(GPS)可以从 ChIP-Seq 数据以高空间分辨率预测蛋白质-DNA 相互作用事件,同时保留解析似乎为单个读取簇的紧密间隔事件的能力。GPS 使用复杂惩罚混合模型来模拟观察到的读取,并使用分段 EM 算法有效地预测事件位置。可选模式允许 GPS 在不同实验之间对齐常见事件。GPS 在合成和实际 ChIP-Seq 数据中检测到更多的联合事件,并且与其他方法相比具有更高的空间分辨率。此外,GPS 的特异性和灵敏度优于或与其他方法相当。