Gharaibeh Raad Z, Fodor Anthony A, Gibas Cynthia J
Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA.
BMC Bioinformatics. 2008 Oct 23;9:452. doi: 10.1186/1471-2105-9-452.
High-density short oligonucleotide microarrays are a primary research tool for assessing global gene expression. Background noise on microarrays comprises a significant portion of the measured raw data, which can have serious implications for the interpretation of the generated data if not estimated correctly.
We introduce an approach to calculate probe affinity based on sequence composition, incorporating nearest-neighbor (NN) information. Our model uses position-specific dinucleotide information, instead of the original single nucleotide approach, and adds up to 10% to the total variance explained (R2) when compared to the previously published model. We demonstrate that correcting for background noise using this approach enhances the performance of the GCRMA preprocessing algorithm when applied to control datasets, especially for detecting low intensity targets.
Modifying the previously published position-dependent affinity model to incorporate dinucleotide information significantly improves the performance of the model. The dinucleotide affinity model enhances the detection of differentially expressed genes when implemented as a background correction procedure in GeneChip preprocessing algorithms. This is conceptually consistent with physical models of binding affinity, which depend on the nearest-neighbor stacking interactions in addition to base-pairing.
高密度短寡核苷酸微阵列是评估全球基因表达的主要研究工具。微阵列上的背景噪声在测量的原始数据中占很大一部分,如果估计不正确,可能会对所生成数据的解释产生严重影响。
我们引入了一种基于序列组成计算探针亲和力的方法,纳入了最近邻(NN)信息。我们的模型使用位置特异性二核苷酸信息,而不是原来的单核苷酸方法,与之前发表的模型相比,可使解释的总方差(R2)增加多达10%。我们证明,使用这种方法校正背景噪声可提高GCRMA预处理算法应用于对照数据集时的性能,特别是在检测低强度靶点方面。
修改之前发表的位置依赖性亲和力模型以纳入二核苷酸信息可显著提高模型性能。当在基因芯片预处理算法中作为背景校正程序实施时,二核苷酸亲和力模型可增强对差异表达基因的检测。这在概念上与结合亲和力的物理模型一致,后者除了碱基配对外还依赖于最近邻堆积相互作用。