Talloen Willem, Clevert Djork-Arné, Hochreiter Sepp, Amaratunga Dhammika, Bijnens Luc, Kass Stefan, Göhlmann Hinrich W H
Johnson & Johnson Pharmaceutical Research & Development, Division of Janssen Pharmaceutica n.v., Beerse, Belgium.
Bioinformatics. 2007 Nov 1;23(21):2897-902. doi: 10.1093/bioinformatics/btm478. Epub 2007 Oct 5.
MOTIVATION: DNA microarray technology typically generates many measurements of which only a relatively small subset is informative for the interpretation of the experiment. To avoid false positive results, it is therefore critical to select the informative genes from the large noisy data before the actual analysis. Most currently available filtering techniques are supervised and therefore suffer from a potential risk of overfitting. The unsupervised filtering techniques, on the other hand, are either not very efficient or too stringent as they may mix up signal with noise. We propose to use the multiple probes measuring the same target mRNA as repeated measures to quantify the signal-to-noise ratio of that specific probe set. A Bayesian factor analysis with specifically chosen prior settings, which models this probe level information, is providing an objective feature filtering technique, named informative/non-informative calls (I/NI calls). RESULTS: Based on 30 real-life data sets (including various human, rat, mice and Arabidopsis studies) and a spiked-in data set, it is shown that I/NI calls is highly effective, with exclusion rates ranging from 70% to 99%. Consequently, it offers a critical solution to the curse of high-dimensionality in the analysis of microarray data. AVAILABILITY: This filtering approach is publicly available as a function implemented in the R package FARMS (www.bioinf.jku.at/software/farms/farms.html).
动机:DNA微阵列技术通常会产生大量测量数据,而其中只有相对较小的一部分子集对于实验解释具有信息价值。为避免假阳性结果,因此在实际分析之前从大量有噪声的数据中选择信息性基因至关重要。目前大多数可用的过滤技术都是有监督的,因此存在过拟合的潜在风险。另一方面,无监督过滤技术要么效率不高,要么过于严格,因为它们可能会将信号与噪声混淆。我们建议使用测量同一目标mRNA的多个探针作为重复测量来量化该特定探针集的信噪比。一种具有专门选择的先验设置的贝叶斯因子分析,它对这种探针水平信息进行建模,提供了一种客观的特征过滤技术,称为信息性/非信息性调用(I/NI调用)。 结果:基于30个实际数据集(包括各种人类、大鼠、小鼠和拟南芥研究)和一个掺入数据集,结果表明I/NI调用非常有效,排除率在70%至99%之间。因此,它为微阵列数据分析中的高维诅咒提供了关键解决方案。 可用性:这种过滤方法作为R包FARMS(www.bioinf.jku.at/software/farms/farms.html)中实现的一个函数公开可用。
Bioinformatics. 2008-1-15
Bioinformatics. 2006-4-15
Bioinformatics. 2006-7-15
Bioinformatics. 2006-4-1
Bioinformatics. 2007-6-1
Bioinformatics. 2007-11-1
Bioinformatics. 2008-5-1
Stat Appl Genet Mol Biol. 2010
Aging (Albany NY). 2020-11-10
Front Bioeng Biotechnol. 2019-11-26
Bioinformatics. 2018-5-1
BMC Bioinformatics. 2017-5-25
Mol Cell Proteomics. 2017-5
Sci Rep. 2016-11-24