Jiao Shuo, Bailey Cheryl P, Zhang Shunpu, Ladunga Istvan
Fred Hutchinson Cancer Institute, Seattle, WA, USA.
Methods Mol Biol. 2010;674:161-77. doi: 10.1007/978-1-60761-854-6_10.
Localizing the binding sites of regulatory proteins is becoming increasingly feasible and accurate. This is due to dramatic progress not only in chromatin immunoprecipitation combined by next-generation sequencing (ChIP-seq) but also in advanced statistical analyses. A fundamental issue, however, is the alarming number of false positive predictions. This problem can be remedied by improved peak calling methods of twin peaks, one at each strand of the DNA, kernel density estimators, and false discovery rate estimations based on control libraries. Predictions are filtered by de novo motif discovery in the peak environments. These methods have been implemented in, among others, Valouev et al.'s Quantitative Enrichment of Sequence Tags (QuEST) software tool. We demonstrate the prediction of the human growth-associated binding protein (GABPalpha) based on ChIP-seq observations.
定位调控蛋白的结合位点正变得越来越可行和准确。这不仅得益于染色质免疫沉淀结合新一代测序技术(ChIP-seq)的巨大进展,也得益于先进的统计分析方法。然而,一个基本问题是假阳性预测的数量惊人。这个问题可以通过改进双峰的峰检测方法来解决,双峰分别位于DNA的每条链上,使用核密度估计器,并基于对照文库进行错误发现率估计。通过在峰环境中进行从头基序发现来过滤预测结果。这些方法已在多种软件工具中实现,如瓦洛耶夫等人的序列标签定量富集(QuEST)软件工具。我们基于ChIP-seq观察结果展示了人类生长相关结合蛋白(GABPα)的预测。