Wang Minghui, Hu Xiaohua, Li Gang, Leach Lindsey J, Potokina Elena, Druka Arnis, Waugh Robbie, Kearsey Michael J, Luo Zewei
Laboratory of Population & Quantitative Genetics, The State Key Laboratory of Genetic Engineering, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China.
PLoS Comput Biol. 2009 Mar;5(3):e1000317. doi: 10.1371/journal.pcbi.1000317. Epub 2009 Mar 13.
It is well known that Affymetrix microarrays are widely used to predict genome-wide gene expression and genome-wide genetic polymorphisms from RNA and genomic DNA hybridization experiments, respectively. It has recently been proposed to integrate the two predictions by use of RNA microarray data only. Although the ability to detect single feature polymorphisms (SFPs) from RNA microarray data has many practical implications for genome study in both sequenced and unsequenced species, it raises enormous challenges for statistical modelling and analysis of microarray gene expression data for this objective. Several methods are proposed to predict SFPs from the gene expression profile. However, their performance is highly vulnerable to differential expression of genes. The SFPs thus predicted are eventually a reflection of differentially expressed genes rather than genuine sequence polymorphisms. To address the problem, we developed a novel statistical method to separate the binding affinity between a transcript and its targeting probe and the parameter measuring transcript abundance from perfect-match hybridization values of Affymetrix gene expression data. We implemented a Bayesian approach to detect SFPs and to genotype a segregating population at the detected SFPs. Based on analysis of three Affymetrix microarray datasets, we demonstrated that the present method confers a significantly improved robustness and accuracy in detecting the SFPs that carry genuine sequence polymorphisms when compared to its rivals in the literature. The method developed in this paper will provide experimental genomicists with advanced analytical tools for appropriate and efficient analysis of their microarray experiments and biostatisticians with insightful interpretation of Affymetrix microarray data.
众所周知,Affymetrix微阵列分别广泛用于从RNA和基因组DNA杂交实验中预测全基因组基因表达和全基因组遗传多态性。最近有人提出仅使用RNA微阵列数据来整合这两种预测。尽管从RNA微阵列数据中检测单特征多态性(SFP)的能力对已测序和未测序物种的基因组研究都有许多实际意义,但这给为此目的进行微阵列基因表达数据的统计建模和分析带来了巨大挑战。有人提出了几种从基因表达谱预测SFP的方法。然而,它们的性能极易受到基因差异表达的影响。如此预测的SFP最终反映的是差异表达的基因,而非真正的序列多态性。为解决这一问题,我们开发了一种新颖的统计方法,从Affymetrix基因表达数据的完美匹配杂交值中分离转录本与其靶向探针之间的结合亲和力以及测量转录本丰度的参数。我们采用贝叶斯方法来检测SFP,并对检测到的SFP处的分离群体进行基因分型。基于对三个Affymetrix微阵列数据集的分析,我们证明,与文献中的其他方法相比,本方法在检测携带真正序列多态性的SFP时具有显著提高的稳健性和准确性。本文开发的方法将为实验基因组学家提供先进的分析工具,以便对其微阵列实验进行适当而有效的分析,并为生物统计学家提供对Affymetrix微阵列数据的深刻解读。