平铺阵列数据的转录本标准化与分割

Transcript normalization and segmentation of tiling array data.

作者信息

Zeller Georg, Henz Stefan R, Laubinger Sascha, Weigel Detlef, Rätsch Gunnar

机构信息

Friedrich Miescher Laboratory of the Max Planck Society & Max Planck Institute for Developmental Biology, Dept. for Molecular Biology, Spemannstr 35 & 39, 72076 Tiibingen, Germany.

出版信息

Pac Symp Biocomput. 2008:527-38.

PMID:18229713

Abstract

For the analysis of transcriptional tiling arrays we have developed two methods based on state-of-the-art machine learning algorithms. First, we present a novel transcript normalization technique to alleviate the effect of oligonucleotide probe sequences on hybridization intensity. It is specifically designed to decrease the variability observed for individual probes complementary to the same transcript. Applying this normalization technique to Arabidopsis tiling arrays, we are able to reduce sequence biases and also significantly improve separation in signal intensity between exonic and intronic/intergenic probes. Our second contribution is a method for transcript mapping. It extends an algorithm proposed for yeast tiling arrays to the more challenging task of spliced transcript identification. When evaluated on raw versus normalized intensities our method achieves highest prediction accuracy when segmentation is performed on transcript-normalized tiling array data.

摘要

对于转录平铺阵列的分析，我们基于最先进的机器学习算法开发了两种方法。首先，我们提出了一种新颖的转录本归一化技术，以减轻寡核苷酸探针序列对杂交强度的影响。它经过专门设计，可降低与同一转录本互补的单个探针所观察到的变异性。将这种归一化技术应用于拟南芥平铺阵列，我们能够减少序列偏差，并显著提高外显子探针与内含子/基因间探针之间信号强度的分离度。我们的第二项贡献是一种转录本映射方法。它将为酵母平铺阵列提出的算法扩展到更具挑战性的剪接转录本识别任务。当在原始强度与归一化强度上进行评估时，我们的方法在对转录本归一化的平铺阵列数据进行分割时可实现最高的预测准确率。