临床蛋白质组学中的统计数据处理

Statistical data processing in clinical proteomics.

作者信息

Smit Suzanne, Hoefsloot Huub C J, Smilde Age K

机构信息

Swammerdam Institute for Life Sciences, Universiteit van Amsterdam - Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands.

出版信息

J Chromatogr B Analyt Technol Biomed Life Sci. 2008 Apr 15;866(1-2):77-88. doi: 10.1016/j.jchromb.2007.10.042. Epub 2007 Nov 4.

DOI:10.1016/j.jchromb.2007.10.042

PMID:18033744

Abstract

This review discusses data analysis strategies for the discovery of biomarkers in clinical proteomics. Proteomics studies produce large amounts of data, characterized by few samples of which many variables are measured. A wealth of classification methods exists for extracting information from the data. Feature selection plays an important role in reducing the dimensionality of the data prior to classification and in discovering biomarker leads. The question which classification strategy works best is yet unanswered. Validation is a crucial step for biomarker leads towards clinical use. Here we only discuss statistical validation, recognizing that biological and clinical validation is of utmost importance. First, there is the need for validated model selection to develop a generalized classifier that predicts new samples correctly. A cross-validation loop that is wrapped around the model development procedure assesses the performance using unseen data. The significance of the model should be tested; we use permutations of the data for comparison with uninformative data. This procedure also tests the correctness of the performance validation. Preferably, a new set of samples is measured to test the classifier and rule out results specific for a machine, analyst, laboratory or the first set of samples. This is not yet standard practice. We present a modular framework that combines feature selection, classification, biomarker discovery and statistical validation; these data analysis aspects are all discussed in this review. The feature selection, classification and biomarker discovery modules can be incorporated or omitted to the preference of the researcher. The validation modules, however, should not be optional. In each module, the researcher can select from a wide range of methods, since there is not one unique way that leads to the correct model and proper validation. We discuss many possibilities for feature selection, classification and biomarker discovery. For validation we advice a combination of cross-validation and permutation testing, a validation strategy supported in the literature.

摘要

本综述讨论了临床蛋白质组学中生物标志物发现的数据分析策略。蛋白质组学研究产生大量数据，其特点是样本数量少但测量的变量众多。有大量分类方法可用于从数据中提取信息。特征选择在分类前降低数据维度以及发现生物标志物线索方面发挥着重要作用。哪种分类策略效果最佳的问题尚未得到解答。验证是生物标志物走向临床应用的关键步骤。在此我们仅讨论统计验证，同时认识到生物学和临床验证至关重要。首先，需要进行经过验证的模型选择，以开发能够正确预测新样本的通用分类器。围绕模型开发过程的交叉验证循环使用未见数据评估性能。应测试模型的显著性；我们使用数据的排列与无信息数据进行比较。此过程还测试性能验证的正确性。最好测量一组新样本以测试分类器并排除特定于某台机器、分析人员、实验室或第一组样本的结果。这尚未成为标准做法。我们提出了一个模块化框架，该框架结合了特征选择、分类、生物标志物发现和统计验证；本综述将讨论所有这些数据分析方面。特征选择、分类和生物标志物发现模块可根据研究人员的偏好纳入或省略。然而，验证模块不应是可选的。在每个模块中，研究人员可以从多种方法中进行选择，因为不存在一种唯一的方法能得出正确的模型和恰当的验证。我们讨论了特征选择、分类和生物标志物发现的多种可能性。对于验证，我们建议结合交叉验证和排列测试，这是文献中支持的一种验证策略。

相似文献

Statistical data processing in clinical proteomics.临床蛋白质组学中的统计数据处理

J Chromatogr B Analyt Technol Biomed Life Sci. 2008 Apr 15;866(1-2):77-88. doi: 10.1016/j.jchromb.2007.10.042. Epub 2007 Nov 4.

Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics.临床神经科学中的功能基因组学和蛋白质组学：数据挖掘与生物信息学

Prog Brain Res. 2006;158:83-108. doi: 10.1016/S0079-6123(06)58004-5.

Assessing the statistical validity of proteomics based biomarkers.评估基于蛋白质组学的生物标志物的统计有效性。

Anal Chim Acta. 2007 Jun 5;592(2):210-7. doi: 10.1016/j.aca.2007.04.043. Epub 2007 Apr 27.

Proteomic data analysis workflow for discovery of candidate biomarker peaks predictive of clinical outcome for patients with acute myeloid leukemia.用于发现预测急性髓性白血病患者临床结局的候选生物标志物峰的蛋白质组学数据分析流程。

J Proteome Res. 2008 Jun;7(6):2332-41. doi: 10.1021/pr070482e. Epub 2008 May 2.

A classification model for the Leiden proteomics competition.用于莱顿蛋白质组学竞赛的分类模型。

Stat Appl Genet Mol Biol. 2008;7(2):Article8. doi: 10.2202/1544-6115.1351. Epub 2008 Feb 19.

A cross-validation study to select a classification procedure for clinical diagnosis based on proteomic mass spectrometry.一项基于蛋白质组质谱法选择用于临床诊断的分类程序的交叉验证研究。

Stat Appl Genet Mol Biol. 2008;7(2):Article12. doi: 10.2202/1544-6115.1363. Epub 2008 Mar 24.

What does it need to be a biomarker? Relationships between resolution, differential quantification and statistical validation of protein surrogate biomarkers.成为生物标志物需要具备什么条件？蛋白质替代生物标志物的分辨率、差异定量和统计验证之间的关系。

Electrophoresis. 2007 Jun;28(12):1970-9. doi: 10.1002/elps.200600752.

So, you want to look for biomarkers (introduction to the special biomarkers issue).那么，你想要寻找生物标志物（特殊生物标志物专题介绍）。

J Proteome Res. 2005 Jul-Aug;4(4):1053-9. doi: 10.1021/pr0501259.

High-performance proteomics as a tool in biomarker discovery.高性能蛋白质组学作为生物标志物发现的一种工具。

Proteomics. 2007 Sep;7 Suppl 1:18-26. doi: 10.1002/pmic.200700183.

Mass spectrometry is only one piece of the puzzle in clinical proteomics.质谱分析只是临床蛋白质组学这一难题中的一个环节。

Brief Funct Genomic Proteomic. 2008 Jan;7(1):74-83. doi: 10.1093/bfgp/eln005. Epub 2008 Feb 28.

引用本文的文献

Assessing Milk Authenticity Using Protein and Peptide Biomarkers: A Decade of Progress in Species Differentiation and Fraud Detection.利用蛋白质和肽生物标志物评估牛奶的真实性：物种鉴别和欺诈检测十年进展

Foods. 2025 Jul 23;14(15):2588. doi: 10.3390/foods14152588.

MultiOmicsAgent: Guided Extreme Gradient-Boosted Decision Trees-Based Approaches for Biomarker-Candidate Discovery in Multiomics Data.多组学智能体：基于引导式极限梯度提升决策树的多组学数据生物标志物候选发现方法

J Proteome Res. 2025 Jun 6;24(6):2816-2831. doi: 10.1021/acs.jproteome.4c01066. Epub 2025 May 25.

MALDI-TOF analysis of blood serum proteome can predict the presence of monoclonal gammopathy of undetermined significance.基质辅助激光解吸电离飞行时间质谱分析血清蛋白质组可预测意义未明的单克隆丙种球蛋白血症的存在。

PLoS One. 2018 Aug 2;13(8):e0201793. doi: 10.1371/journal.pone.0201793. eCollection 2018.

Integrated Chemometrics and Statistics to Drive Successful Proteomics Biomarker Discovery.整合化学计量学与统计学以推动蛋白质组学生物标志物的成功发现。

Proteomes. 2018 Apr 26;6(2):20. doi: 10.3390/proteomes6020020.

Preliminary analysis of the protein profile in saliva during physiological term and preterm delivery.生理期末和早产期间唾液中蛋白质谱的初步分析。

Mol Med Rep. 2018 Jun;17(6):8253-8259. doi: 10.3892/mmr.2018.8909. Epub 2018 Apr 20.

Integration of Proteomics and Metabolomics in Exploring Genetic and Rare Metabolic Diseases.蛋白质组学与代谢组学在探索遗传性和罕见代谢性疾病中的整合

Kidney Dis (Basel). 2017 Jul;3(2):66-77. doi: 10.1159/000477493. Epub 2017 Jun 30.

Insights into psychosis risk from leukocyte microRNA expression.从白细胞微小RNA表达看精神病风险

Transl Psychiatry. 2016 Dec 13;6(12):e981. doi: 10.1038/tp.2016.148.

Integrative analysis to select cancer candidate biomarkers to targeted validation.综合分析以选择癌症候选生物标志物进行靶向验证。

Oncotarget. 2015 Dec 22;6(41):43635-52. doi: 10.18632/oncotarget.6018.

Severity of thought disorder predicts psychosis in persons at clinical high-risk.思维障碍的严重程度可预测临床高危人群的精神病发作。

Schizophr Res. 2015 Dec;169(1-3):169-177. doi: 10.1016/j.schres.2015.09.008. Epub 2015 Oct 4.

Proteomic approaches to identify circulating biomarkers in patients with abdominal aortic aneurysm.蛋白质组学方法在腹主动脉瘤患者中鉴定循环生物标志物

Am J Cardiovasc Dis. 2015 Sep 15;5(3):140-5. eCollection 2015.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

临床蛋白质组学中的统计数据处理

Statistical data processing in clinical proteomics.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献