Ahmad Norhaiza, Zhang Jian, Brown Phillip J, James David C, Birch John R, Racher Andrew J, Smales C Mark
Institute of Mathematics and Statistics, University of Kent, Canterbury, Kent, UK.
Biochim Biophys Acta. 2006 Jul;1764(7):1179-87. doi: 10.1016/j.bbapap.2006.05.002. Epub 2006 May 19.
We have undertaken two-dimensional gel electrophoresis proteomic profiling on a series of cell lines with different recombinant antibody production rates. Due to the nature of gel-based experiments not all protein spots are detected across all samples in an experiment, and hence datasets are invariably incomplete. New approaches are therefore required for the analysis of such graduated datasets. We approached this problem in two ways. Firstly, we applied a missing value imputation technique to calculate missing data points. Secondly, we combined a singular value decomposition based hierarchical clustering with the expression variability test to identify protein spots whose expression correlates with increased antibody production. The results have shown that while imputation of missing data was a useful method to improve the statistical analysis of such data sets, this was of limited use in differentiating between the samples investigated, and highlighted a small number of candidate proteins for further investigation.
我们对一系列具有不同重组抗体产生率的细胞系进行了二维凝胶电泳蛋白质组分析。由于基于凝胶的实验的性质,并非实验中的所有样本都能检测到所有蛋白质斑点,因此数据集总是不完整的。因此,需要新的方法来分析此类分级数据集。我们通过两种方式解决了这个问题。首先,我们应用了缺失值插补技术来计算缺失的数据点。其次,我们将基于奇异值分解的层次聚类与表达变异性测试相结合,以识别其表达与抗体产生增加相关的蛋白质斑点。结果表明,虽然缺失数据的插补是改善此类数据集统计分析的有用方法,但在区分所研究的样本方面作用有限,并突出了少数候选蛋白质以供进一步研究。