School of Electronics and Computer Science, University of Southampton, Southampton, UK.
Bioinformatics. 2010 May 1;26(9):1185-91. doi: 10.1093/bioinformatics/btq104. Epub 2010 Mar 8.
High-throughput measurements of mRNA abundances from microarrays involve several stages of preprocessing. At each stage, a user has access to a large number of algorithms with no universally agreed guidance on which of these to use. We show that binary representations of gene expressions, retaining only information on whether a gene is expressed or not, reduces the variability in results caused by algorithmic choice, while also improving the quality of inference drawn from microarray studies.
Binary representation of transcriptome data has the desirable property of reducing the variability introduced at the preprocessing stages due to algorithmic choice. We compare the effect of the choice of algorithms on different problems and suggest that using binary representation of microarray data with Tanimoto kernel for support vector machine reduces the effect of the choice of algorithm and simultaneously improves the performance of classification of phenotypes.
从微阵列中高通量测量 mRNA 丰度涉及多个预处理阶段。在每个阶段,用户都可以访问大量的算法,但对于应该使用哪些算法,并没有普遍的共识。我们表明,仅保留有关基因是否表达的信息的基因表达的二进制表示形式,减少了由于算法选择而导致的结果变异性,同时也提高了从微阵列研究中得出的推论的质量。
转录组数据的二进制表示形式具有减少由于算法选择而在预处理阶段引入的变异性的理想特性。我们比较了算法选择对不同问题的影响,并建议使用支持向量机的 Tanimoto 核的微阵列数据的二进制表示形式来减少算法选择的影响,同时提高表型分类的性能。