Cui Xinping, Xu Jin, Asghar Rehana, Condamine Pascal, Svensson Jan T, Wanamaker Steve, Stein Nils, Roose Mikeal, Close Timothy J
Department of Statistics, University of California, Riverside, 92521, USA.
Bioinformatics. 2005 Oct 15;21(20):3852-8. doi: 10.1093/bioinformatics/bti640. Epub 2005 Aug 23.
Genomic DNA was hybridized to oligonucleotide microarrays to identify single-feature polymorphisms (SFP) for Arabidopsis, which has a genome size of approximately 130 Mb. However, that method does not work well for organisms such as barley, with a much larger 5200 Mb genome. In the present study, we demonstrate SFP detection using a small number of replicate datasets and complex RNA as a surrogate for barley DNA. To identify single probes defining SFPs in the data, we developed a method using robustified projection pursuit (RPP). This method first evaluates, for each probe set, the overall differentiation of signal intensities between two genotypes and then measures the contribution of the individual probes within the probe set to the overall differentiation.
RNA from whole seedlings with and without dehydration stress provided 'present' calls for approximately 75% of probe sets. Using triplicated data, among the 5% of 'present' probe sets identified as most likely to contain at least one SFP probe, at least 80% are correctly predicted. This was determined by direct sequencing of PCR amplicons derived from barley genomic DNA. Using a 5 percentile cutoff, we defined 2007 SFP probes contained in 1684 probe sets by combining three parental genotype comparisons: Steptoe versus Morex, Morex versus Barke and Oregon Wolfe Barley Dominant versus Recessive.
The algorithm is available upon request from the corresponding author.
将基因组DNA与寡核苷酸微阵列杂交,以鉴定拟南芥的单特征多态性(SFP),其基因组大小约为130 Mb。然而,该方法对于基因组大得多(5200 Mb)的大麦等生物并不适用。在本研究中,我们展示了使用少量重复数据集和复杂RNA替代大麦DNA进行SFP检测。为了在数据中鉴定定义SFP的单个探针,我们开发了一种使用稳健投影追踪(RPP)的方法。该方法首先针对每个探针集评估两种基因型之间信号强度的总体差异,然后测量探针集内单个探针对总体差异的贡献。
来自有和没有脱水胁迫的全苗的RNA为大约75%的探针集提供了“存在”调用。使用三份重复数据,在被鉴定为最有可能包含至少一个SFP探针的5%的“存在”探针集中,至少80%被正确预测。这是通过对源自大麦基因组DNA的PCR扩增子进行直接测序确定的。使用5%的截止值,通过组合三个亲本基因型比较:Steptoe与Morex、Morex与Barke以及俄勒冈狼大麦显性与隐性,我们定义了1684个探针集中包含的2007个SFP探针。
可应通讯作者要求提供该算法。