Kechris Katerina J, Biehs Brian, Kornberg Thomas B
University of Colorado Denver, CO, USA.
Stat Appl Genet Mol Biol. 2010;9(1):Article29. doi: 10.2202/1544-6115.1434. Epub 2010 Aug 6.
High density tiling arrays are an effective strategy for genome-wide identification of transcription factor binding regions. Sliding window methods that calculate moving averages of log ratios or t-statistics have been useful for the analysis of tiling array data. Here, we present a method that generalizes the moving average approach to evaluate sliding windows of p-values by using combined p-value statistics. In particular, the combined p-value framework can be useful in situations when taking averages of the corresponding test-statistic for the hypothesis may not be appropriate or when it is difficult to assess the significance of these averages. We exhibit the strengths of the combined p-values methods on Drosophila tiling array data and assess their ability to predict genomic regions enriched for transcription factor binding. The predictions are evaluated based on their proximity to target genes and their enrichment of known transcription factor binding sites. We also present an application for the generalization of the moving average based on integrating two different tiling array experiments.
高密度平铺阵列是全基因组范围内识别转录因子结合区域的有效策略。计算对数比值或t统计量移动平均值的滑动窗口方法,已被用于分析平铺阵列数据。在此,我们提出一种方法,该方法通过使用组合p值统计量,将移动平均方法推广到评估p值的滑动窗口。特别是,当对假设的相应检验统计量取平均值不合适,或者难以评估这些平均值的显著性时,组合p值框架可能会很有用。我们展示了组合p值方法在果蝇平铺阵列数据上的优势,并评估了它们预测富含转录因子结合的基因组区域的能力。基于预测与靶基因的接近程度及其对已知转录因子结合位点的富集情况来评估这些预测。我们还提出了一种基于整合两个不同平铺阵列实验对移动平均进行推广的应用。