Johnson David S, Li Wei, Gordon D Benjamin, Bhattacharjee Arindam, Curry Bo, Ghosh Jayati, Brizuela Leonardo, Carroll Jason S, Brown Myles, Flicek Paul, Koch Christoph M, Dunham Ian, Bieda Mark, Xu Xiaoqin, Farnham Peggy J, Kapranov Philipp, Nix David A, Gingeras Thomas R, Zhang Xinmin, Holster Heather, Jiang Nan, Green Roland D, Song Jun S, McCuine Scott A, Anton Elizabeth, Nguyen Loan, Trinklein Nathan D, Ye Zhen, Ching Keith, Hawkins David, Ren Bing, Scacheri Peter C, Rozowsky Joel, Karpikov Alexander, Euskirchen Ghia, Weissman Sherman, Gerstein Mark, Snyder Michael, Yang Annie, Moqtaderi Zarmik, Hirsch Heather, Shulha Hennady P, Fu Yutao, Weng Zhiping, Struhl Kevin, Myers Richard M, Lieb Jason D, Liu X Shirley
Department of Genetics, Stanford University Medical Center, Stanford, California 94305, USA.
Genome Res. 2008 Mar;18(3):393-403. doi: 10.1101/gr.7080508. Epub 2008 Feb 7.
The most widely used method for detecting genome-wide protein-DNA interactions is chromatin immunoprecipitation on tiling microarrays, commonly known as ChIP-chip. Here, we conducted the first objective analysis of tiling array platforms, amplification procedures, and signal detection algorithms in a simulated ChIP-chip experiment. Mixtures of human genomic DNA and "spike-ins" comprised of nearly 100 human sequences at various concentrations were hybridized to four tiling array platforms by eight independent groups. Blind to the number of spike-ins, their locations, and the range of concentrations, each group made predictions of the spike-in locations. We found that microarray platform choice is not the primary determinant of overall performance. In fact, variation in performance between labs, protocols, and algorithms within the same array platform was greater than the variation in performance between array platforms. However, each array platform had unique performance characteristics that varied with tiling resolution and the number of replicates, which have implications for cost versus detection power. Long oligonucleotide arrays were slightly more sensitive at detecting very low enrichment. On all platforms, simple sequence repeats and genome redundancy tended to result in false positives. LM-PCR and WGA, the most popular sample amplification techniques, reproduced relative enrichment levels with high fidelity. Performance among signal detection algorithms was heavily dependent on array platform. The spike-in DNA samples and the data presented here provide a stable benchmark against which future ChIP platforms, protocol improvements, and analysis methods can be evaluated.
检测全基因组蛋白质 - DNA 相互作用最广泛使用的方法是在平铺微阵列上进行染色质免疫沉淀,通常称为 ChIP - chip。在此,我们在模拟的 ChIP - chip 实验中对平铺阵列平台、扩增程序和信号检测算法进行了首次客观分析。由八个独立小组将人类基因组 DNA 与由近 100 个不同浓度的人类序列组成的“掺入物”混合物与四个平铺阵列平台进行杂交。各小组在不知道掺入物数量、位置和浓度范围的情况下,对掺入物位置进行预测。我们发现微阵列平台的选择并非整体性能的主要决定因素。事实上,同一阵列平台内不同实验室、方案和算法之间的性能差异大于阵列平台之间的性能差异。然而,每个阵列平台都有独特的性能特征,这些特征会随平铺分辨率和重复次数而变化,这对成本与检测能力有影响。长寡核苷酸阵列在检测极低丰度时稍微更敏感一些。在所有平台上,简单序列重复和基因组冗余往往会导致假阳性。最流行的样本扩增技术 LM - PCR 和 WGA 能够高度保真地重现相对丰度水平。信号检测算法之间的性能很大程度上取决于阵列平台。此处呈现的掺入物 DNA 样本和数据提供了一个稳定的基准,可据此评估未来的 ChIP 平台、方案改进和分析方法。