Birkner Merrill D, Hubbard Alan E, van der Laan Mark J, Skibola Christine F, Hegedus Christine M, Smith Martyn T
Division of Biostatistics, School of Public Health, University of California, Berkeley, USA.
Stat Appl Genet Mol Biol. 2006;5:Article11. doi: 10.2202/1544-6115.1198. Epub 2006 Apr 21.
A new data filtering method for SELDI-TOF MS proteomic spectra data is described. We examined technical repeats (2 per subject) of intensity versus m/z (mass/charge) of bone marrow cell lysate for two groups of childhood leukemia patients: acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL). As others have noted, the type of data processing as well as experimental variability can have a disproportionate impact on the list of "interesting'' proteins (see Baggerly et al. (2004)). We propose a list of processing and multiple testing techniques to correct for 1) background drift; 2) filtering using smooth regression and cross-validated bandwidth selection; 3) peak finding; and 4) methods to correct for multiple testing (van der Laan et al. (2005)). The result is a list of proteins (indexed by m/z) where average expression is significantly different among disease (or treatment, etc.) groups. The procedures are intended to provide a sensible and statistically driven algorithm, which we argue provides a list of proteins that have a significant difference in expression. Given no sources of unmeasured bias (such as confounding of experimental conditions with disease status), proteins found to be statistically significant using this technique have a low probability of being false positives.
本文描述了一种用于表面增强激光解吸电离飞行时间质谱(SELDI-TOF MS)蛋白质组学光谱数据的新型数据过滤方法。我们检测了两组儿童白血病患者(急性髓细胞白血病(AML)和急性淋巴细胞白血病(ALL))骨髓细胞裂解物的强度与质荷比(m/z)的技术重复数据(每个受试者2次)。正如其他人所指出的,数据处理类型以及实验变异性可能会对“有趣”蛋白质列表产生不成比例的影响(见Baggerly等人(2004年))。我们提出了一系列处理和多重检验技术,以校正:1)背景漂移;2)使用平滑回归和交叉验证带宽选择进行过滤;3)峰检测;以及4)多重检验校正方法(van der Laan等人(2005年))。结果得到一份蛋白质列表(按m/z索引),其中疾病(或治疗等)组之间的平均表达存在显著差异。这些程序旨在提供一种合理且由统计驱动的算法,我们认为该算法能提供一份表达存在显著差异的蛋白质列表。在不存在未测量偏差来源(如实验条件与疾病状态的混淆)的情况下,使用该技术发现具有统计学显著性的蛋白质为假阳性的概率较低。