文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

评估用于多变量代谢组学数据的统计验证工具的性能。

Assessing the performance of statistical validation tools for megavariate metabolomics data.

作者信息

Rubingh Carina M, Bijlsma Sabina, Derks Eduard P P A, Bobeldijk Ivana, Verheij Elwin R, Kochhar Sunil, Smilde Age K

机构信息

Business Unit Analytical Sciences, TNO Quality of Life, P.O. Box 360, 3700 AJ Zeist, The Netherlands.

BioAnalytical Science Department, Nestlé Research Center, P.O. Box 44, CH-1000 Lausanne 26, Switzerland.

出版信息

Metabolomics. 2006;2(2):53-61. doi: 10.1007/s11306-006-0022-6. Epub 2006 Jul 11.


DOI:10.1007/s11306-006-0022-6
PMID:24489531
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3906710/
Abstract

Statistical model validation tools such as cross-validation, jack-knifing model parameters and permutation tests are meant to obtain an objective assessment of the performance and stability of a statistical model. However, little is known about the performance of these tools for megavariate data sets, having, for instance, a number of variables larger than 10 times the number of subjects. The performance is assessed for megavariate metabolomics data, but the conclusions also carry over to proteomics, transcriptomics and many other research areas. Partial least squares discriminant analyses models were built for several LC-MS lipidomic training data sets of various numbers of lean and obese subjects. The training data sets were compared on their modelling performance and their predictability using a 10-fold cross-validation, a permutation test, and test data sets. A wide range of cross-validation error rates was found (from 7.5% to 16.3% for the largest trainings set and from 0% to 60% for the smallest training set) and the error rate increased when the number of subjects decreased. The test error rates varied from 5% to 50%. The smaller the number of subjects compared to the number of variables, the less the outcome of validation tools such as cross-validation, jack-knifing model parameters and permutation tests can be trusted. The result depends crucially on the specific sample of subjects that is used for modelling. The validation tools cannot be used as warning mechanism for problems due to sample size or to representativity of the sampling.

摘要

诸如交叉验证、刀切法模型参数和置换检验等统计模型验证工具旨在对统计模型的性能和稳定性进行客观评估。然而,对于多变量数据集(例如变量数量超过样本数量10倍的数据集),这些工具的性能却鲜为人知。本文评估了多变量代谢组学数据的性能,但所得结论同样适用于蛋白质组学、转录组学及许多其他研究领域。针对若干包含不同数量瘦人和肥胖受试者的液相色谱-质谱脂质组学训练数据集,构建了偏最小二乘判别分析模型。利用10倍交叉验证、置换检验和测试数据集,对训练数据集的建模性能和可预测性进行了比较。结果发现交叉验证错误率范围很广(最大训练集的错误率为7.5%至16.3%,最小训练集的错误率为0%至60%),且样本数量减少时错误率会增加。测试错误率在5%至50%之间。与变量数量相比,样本数量越少,诸如交叉验证、刀切法模型参数和置换检验等验证工具的结果就越不可信。结果很大程度上取决于用于建模的特定样本。验证工具不能用作因样本量或抽样代表性问题发出警告的机制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f69/3906710/25faa8c546ee/11306_2006_22_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f69/3906710/3e76a65db9d1/11306_2006_22_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f69/3906710/c1be37d0b21b/11306_2006_22_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f69/3906710/d1762aae374b/11306_2006_22_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f69/3906710/25faa8c546ee/11306_2006_22_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f69/3906710/3e76a65db9d1/11306_2006_22_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f69/3906710/c1be37d0b21b/11306_2006_22_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f69/3906710/d1762aae374b/11306_2006_22_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f69/3906710/25faa8c546ee/11306_2006_22_Fig4_HTML.jpg

相似文献

[1]
Assessing the performance of statistical validation tools for megavariate metabolomics data.

Metabolomics. 2006

[2]
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification

2015

[3]
Statistical validation of megavariate effects in ASCA.

BMC Bioinformatics. 2007-8-30

[4]
Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies.

Metabolomics. 2012-6

[5]
PLS/OPLS models in metabolomics: the impact of permutation of dataset rows on the K-fold cross-validation quality parameters.

Mol Biosyst. 2015-1

[6]
Megavariate analysis of environmental QSAR data. Part I--a basic framework founded on principal component analysis (PCA), partial least squares (PLS), and statistical molecular design (SMD).

Mol Divers. 2006-5

[7]
A tutorial review: Metabolomics and partial least squares-discriminant analysis--a marriage of convenience or a shotgun wedding.

Anal Chim Acta. 2015-6-16

[8]
Assessing the statistical validity of proteomics based biomarkers.

Anal Chim Acta. 2007-6-5

[9]
Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation.

Anal Chem. 2006-1-15

[10]
Improved variable reduction in partial least squares modelling by Global-Minimum Error Uninformative-Variable Elimination.

Anal Chim Acta. 2017-6-16

引用本文的文献

[1]
Molecular Determinants in Seminal Plasma and Spermatozoa: Nontargeted Metabolomics.

Methods Mol Biol. 2025

[2]
Metabolomics of early blight (Alternaria solani) susceptible tomato (Solanum lycopersicum) unfolds key biomarker metabolites and involved metabolic pathways.

Sci Rep. 2023-11-29

[3]
From big data to big insights: statistical and bioinformatic approaches for exploring the lipidome.

Anal Bioanal Chem. 2024-4

[4]
Attenuated Total Reflection Fourier Transform Infrared Spectroscopy for Forensic Screening of Long-Term Alcohol Consumption from Human Nails.

ACS Omega. 2023-6-6

[5]
Multiomics characterization of methicillin-resistant Staphylococcus aureus (MRSA) isolates with heterogeneous intermediate resistance to vancomycin (hVISA) in Latin America.

J Antimicrob Chemother. 2022-12-23

[6]
A targeted metabolic analysis of football players and its association to player load: Comparison between women and men profiles.

Front Physiol. 2022-9-30

[7]
New bladder cancer non-invasive surveillance method based on voltammetric electronic tongue measurement of urine.

iScience. 2022-8-4

[8]
Metabolomics in clinical and forensic toxicology, sports anti-doping and veterinary residues.

Drug Test Anal. 2022-5

[9]
Prediction of radiation pneumonitis with machine learning using 4D-CT based dose-function features.

J Radiat Res. 2022-1-20

[10]
Discrimination of the Geographical Origin of Soybeans Using NMR-Based Metabolomics.

Foods. 2021-2-17

本文引用的文献

[1]
A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics.

Stat Appl Genet Mol Biol. 2005

[2]
Fat oxidation before and after a high fat load in the obese insulin-resistant state.

J Clin Endocrinol Metab. 2006-4

[3]
Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation.

Anal Chem. 2006-1-15

[4]
Randomized, multi-center trial of two hypo-energetic diets in obese subjects: high- versus low-fat content.

Int J Obes (Lond). 2006-3

[5]
Fusion of mass spectrometry-based metabolomics data.

Anal Chem. 2005-10-15

[6]
HPLC-MS-based methods for the study of metabonomics.

J Chromatogr B Analyt Technol Biomed Life Sci. 2005-3-5

[7]
Cyclosporin A-induced changes in endogenous metabolites in rat urine: a metabonomic investigation using high field 1H NMR spectroscopy, HPLC-TOF/MS and chemometrics.

J Pharm Biomed Anal. 2004-5-28

[8]
Is cross-validation valid for small-sample microarray classification?

Bioinformatics. 2004-2-12

[9]
Metabolite profiling in rat urine by liquid chromatography/electrospray ion trap mass spectrometry. Application to the study of heavy metal toxicity.

Rapid Commun Mass Spectrom. 2003

[10]
Metabonomic analysis of mouse urine by liquid-chromatography-time of flight mass spectrometry (LC-TOFMS): detection of strain, diurnal and gender differences.

Analyst. 2003-7

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索