Department of Mathematical Sciences, University of Essex, Colchester, UK.
IEEE/ACM Trans Comput Biol Bioinform. 2010 Oct-Dec;7(4):647-53. doi: 10.1109/TCBB.2008.108.
Modern biology has moved from a science of individual measurements to a science where data are collected on an industrial scale. Foremost, among the new tools for biochemistry are chip arrays which, in one operation, measure hundreds of thousands or even millions of DNA sequences or RNA transcripts. While this is impressive, increasingly sophisticated analysis tools have been required to convert gene array data into gene expression levels. Despite the assumption that noise levels are low, since the number of measurements for an individual gene is small, identifying which signals are affected by noise is a priority. High-density oligonucleotide array (HDONAs) from NCBI GEO shows that, even in the best Human GeneChips 1/4 percent of data are affected by spatial noise. Earlier designs are noisier and spatial defects may affect more than 25 percent of probes. BioConductor R code is available as supplementary material which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.108 and via http://bioinformatics.essex.ac.uk/users/wlangdon/TCBB-2007-11-0161.tar.gz.
现代生物学已经从个体测量的科学发展到了工业规模数据收集的科学。生物化学的新工具中,最重要的是芯片阵列,它可以在一次操作中测量数十万甚至数百万个 DNA 序列或 RNA 转录本。虽然这令人印象深刻,但为了将基因芯片数据转化为基因表达水平,需要越来越复杂的分析工具。尽管假设噪声水平较低,因为单个基因的测量数量较少,但确定哪些信号受到噪声的影响是当务之急。来自 NCBI GEO 的高密度寡核苷酸阵列 (HDONAs) 表明,即使是最好的人类基因芯片,也有 1/4 的数据受到空间噪声的影响。早期的设计噪声更大,空间缺陷可能会影响超过 25%的探针。BioConductor R 代码可作为补充材料在计算机学会数字图书馆获取,网址为 http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.108 和 http://bioinformatics.essex.ac.uk/users/wlangdon/TCBB-2007-11-0161.tar.gz。