Klebanov Lev, Chen Linlin, Yakovlev Andrei
Department of Biostatistics and Computational Biology, University of Rochester, 601 Elmwood Avenue, Rochester, Box 630, New York 14642, USA.
Biol Direct. 2007 Nov 7;2:28. doi: 10.1186/1745-6150-2-28.
BACKGROUND: This work was undertaken in response to a recently published paper by Okoniewski and Miller (BMC Bioinformatics 2006, 7: Article 276). The authors of that paper came to the conclusion that the process of multiple targeting in short oligonucleotide microarrays induces spurious correlations and this effect may deteriorate the inference on correlation coefficients. The design of their study and supporting simulations cast serious doubt upon the validity of this conclusion. The work by Okoniewski and Miller drove us to revisit the issue by means of experimentation with biological data and probabilistic modeling of cross-hybridization effects. RESULTS: We have identified two serious flaws in the study by Okoniewski and Miller: (1) The data used in their paper are not amenable to correlation analysis; (2) The proposed simulation model is inadequate for studying the effects of cross-hybridization. Using two other data sets, we have shown that removing multiply targeted probe sets does not lead to a shift in the histogram of sample correlation coefficients towards smaller values. A more realistic approach to mathematical modeling of cross-hybridization demonstrates that this process is by far more complex than the simplistic model considered by the authors. A diversity of correlation effects (such as the induction of positive or negative correlations) caused by cross-hybridization can be expected in theory but there are natural limitations on the ability to provide quantitative insights into such effects due to the fact that they are not directly observable. CONCLUSION: The proposed stochastic model is instrumental in studying general regularities in hybridization interaction between probe sets in microarray data. As the problem stands now, there is no compelling reason to believe that multiple targeting causes a large-scale effect on the correlation structure of Affymetrix gene expression data. Our analysis suggests that the observed long-range correlations in microarray data are of a biological nature rather than a technological flaw.
背景:本研究是针对奥科涅夫斯基和米勒最近发表的一篇论文(《BMC生物信息学》2006年,7卷:第276号文章)而开展的。该论文的作者得出结论,短寡核苷酸微阵列中的多重靶向过程会引发虚假相关性,且这种效应可能会削弱对相关系数的推断。他们的研究设计及支持性模拟对这一结论的有效性提出了严重质疑。奥科涅夫斯基和米勒的研究促使我们通过对生物数据进行实验以及对交叉杂交效应进行概率建模来重新审视这一问题。 结果:我们在奥科涅夫斯基和米勒的研究中发现了两个严重缺陷:(1)他们论文中使用的数据不适用于相关性分析;(2)所提出的模拟模型不足以研究交叉杂交的影响。使用另外两个数据集,我们表明去除多重靶向的探针集并不会导致样本相关系数的直方图向较小值偏移。一种更现实的交叉杂交数学建模方法表明,这个过程远比作者所考虑的简单模型复杂得多。理论上可以预期交叉杂交会产生多种相关效应(例如正相关或负相关的诱导),但由于这些效应无法直接观察到,因此在对其进行定量洞察的能力方面存在天然限制。 结论:所提出的随机模型有助于研究微阵列数据中探针集之间杂交相互作用的一般规律。就目前的问题而言,没有令人信服的理由相信多重靶向会对Affymetrix基因表达数据的相关结构产生大规模影响。我们的分析表明,在微阵列数据中观察到的长程相关性具有生物学性质,而非技术缺陷。
BMC Bioinformatics. 2005-5-16
Tuberculosis (Edinb). 2006-3
Nucleic Acids Res. 2006-5-24
PLoS One. 2012-4-20
Comput Stat Data Anal. 2009-3-15
J Am Med Inform Assoc. 2009-8-28
Biol Direct. 2008-8-20
J Bioinform Comput Biol. 2007-8
Biol Direct. 2007-4-11
BMC Bioinformatics. 2006-6-2
Stat Appl Genet Mol Biol. 2006