West-Nielsen Mikkel, Høgdall Estrid V, Marchiori Elena, Høgdall Claus K, Schou Christian, Heegaard Niels H H
Department of Autoimmunology, Statens Serum Institut, DK-2300 Copenhagen S, Denmark.
Anal Chem. 2005 Aug 15;77(16):5114-23. doi: 10.1021/ac050253g.
Proteomic investigations of sera are potentially of value for diagnosis, prognosis, choice of therapy, and disease activity assessment by virtue of discovering new biomarkers and biomarker patterns. Much debate focuses on the biological relevance and the need for identification of such biomarkers while less effort has been invested in devising standard procedures for sample preparation and storage in relation to model building based on complex sets of mass spectrometric (MS) data. Thus, development of standardized methods for collection and storage of patient samples together with standards for transportation and handling of samples are needed. This requires knowledge about how sample processing affects MS-based proteome analyses and thereby how nonbiological biased classification errors are avoided. In this study, we characterize the effects of sample handling, including clotting conditions, storage temperature, storage time, and freeze/thaw cycles, on MS-based proteomics of human serum by using principal components analysis, support vector machine learning, and clustering methods based on genetic algorithms as class modeling and prediction methods. Using spiking to artificially create differentiable sample groups, this integrated approach yields data that--even when working with sample groups that differ more than may be expected in biological studies--clearly demonstrate the need for comparable sampling conditions for samples used for modeling and for the samples that are going into the test set group. Also, the study emphasizes the difference between class prediction and class comparison studies as well as the advantages and disadvantages of different modeling methods.
血清蛋白质组学研究通过发现新的生物标志物和生物标志物模式,在诊断、预后、治疗选择和疾病活动评估方面具有潜在价值。许多争论集中在这些生物标志物的生物学相关性以及识别它们的必要性上,而在基于复杂质谱(MS)数据集构建模型时,在设计样本制备和储存的标准程序方面投入的精力较少。因此,需要开发患者样本收集和储存的标准化方法以及样本运输和处理的标准。这需要了解样本处理如何影响基于MS的蛋白质组分析,从而避免非生物学偏倚的分类错误。在本研究中,我们使用主成分分析、支持向量机学习以及基于遗传算法的聚类方法作为分类建模和预测方法,来表征样本处理(包括凝血条件、储存温度、储存时间和冻融循环)对人血清基于MS的蛋白质组学的影响。通过加标人工创建可区分的样本组,这种综合方法产生的数据——即使在处理比生物学研究中预期差异更大的样本组时——也清楚地表明,用于建模的样本和进入测试集组的样本需要可比的采样条件。此外,该研究强调了分类预测和分类比较研究之间的差异以及不同建模方法的优缺点。