运用偏最小二乘判别分析评估机遇相关性对变量选择的影响。

Evaluation of the effect of chance correlations on variable selection using Partial Least Squares-Discriminant Analysis.

机构信息

Neonatal Research Centre, Health Research Institute La Fé, 46009 Valencia, Spain.

出版信息

Talanta. 2013 Nov 15;116:835-40. doi: 10.1016/j.talanta.2013.07.048. Epub 2013 Aug 9.

DOI:10.1016/j.talanta.2013.07.048

Abstract

Variable subset selection is often mandatory in high throughput metabolomics and proteomics. However, depending on the variable to sample ratio there is a significant susceptibility of variable selection towards chance correlations. The evaluation of the predictive capabilities of PLSDA models estimated by cross-validation after feature selection provides overly optimistic results if the selection is performed on the entire set and no external validation set is available. In this work, a simulation of the statistical null hypothesis is proposed to test whether the discrimination capability of a PLSDA model after variable selection estimated by cross-validation is statistically higher than that attributed to the presence of chance correlations in the original data set. Statistical significance of PLSDA CV-figures of merit obtained after variable selection is expressed by means of p-values calculated by using a permutation test that included the variable selection step. The reliability of the approach is evaluated using two variable selection methods on experimental and simulated data sets with and without induced class differences. The proposed approach can be considered as a useful tool when no external validation set is available and provides a straightforward way to evaluate differences between variable selection methods.

摘要

在高通量代谢组学和蛋白质组学中，变量子集选择通常是强制性的。然而，根据变量与样本的比例，变量选择对偶然相关具有显著的敏感性。如果在整个数据集上进行选择，并且没有外部验证集，则通过交叉验证估计的 PLSDA 模型的预测能力的评估会提供过于乐观的结果。在这项工作中，提出了一种统计零假设的模拟，以测试经过交叉验证的变量选择后的 PLSDA 模型的判别能力是否在统计上高于原始数据集中原先存在的偶然相关性。通过使用包含变量选择步骤的置换检验来计算 p 值来表示经过变量选择后获得的 PLSDA CV 度量的统计显著性。该方法的可靠性使用两种变量选择方法在具有和不具有诱导类差异的实验和模拟数据集上进行了评估。当没有外部验证集可用时，该方法可以被认为是一种有用的工具，并提供了一种直接评估变量选择方法之间差异的方法。

相似文献

Evaluation of the effect of chance correlations on variable selection using Partial Least Squares-Discriminant Analysis.运用偏最小二乘判别分析评估机遇相关性对变量选择的影响。

Talanta. 2013 Nov 15;116:835-40. doi: 10.1016/j.talanta.2013.07.048. Epub 2013 Aug 9.

A tutorial review: Metabolomics and partial least squares-discriminant analysis--a marriage of convenience or a shotgun wedding.一篇教程综述：代谢组学与偏最小二乘判别分析——是权宜结合还是仓促结合。

Anal Chim Acta. 2015 Jun 16;879:10-23. doi: 10.1016/j.aca.2015.02.012. Epub 2015 Feb 11.

Application of Discriminant Analysis and Cross-Validation on Proteomics Data.判别分析和交叉验证在蛋白质组学数据中的应用。

Methods Mol Biol. 2016;1362:175-84. doi: 10.1007/978-1-4939-3106-4_11.

Assessment of the statistical significance of classifications in infrared spectroscopy based diagnostic models.基于红外光谱的诊断模型中分类统计显著性的评估。

Analyst. 2015 Apr 7;140(7):2422-7. doi: 10.1039/c4an01783h.

Combining NMR and LC/MS Using Backward Variable Elimination: Metabolomics Analysis of Colorectal Cancer, Polyps, and Healthy Controls.采用反向变量消除法结合 NMR 和 LC/MS：结直肠癌、息肉和健康对照的代谢组学分析。

Anal Chem. 2016 Aug 16;88(16):7975-83. doi: 10.1021/acs.analchem.6b00885. Epub 2016 Aug 1.

Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data.近红外光谱中的变量选择：生物柴油数据特征选择方法的基准测试。

Anal Chim Acta. 2011 Apr 29;692(1-2):63-72. doi: 10.1016/j.aca.2011.03.006. Epub 2011 Mar 8.

Variable selection and interpretation in structure-affinity correlation modeling of estrogen receptor binders.雌激素受体结合剂结构-亲和力相关建模中的变量选择与解释

J Chem Inf Model. 2005 Nov-Dec;45(6):1507-19. doi: 10.1021/ci0501645.

Discriminating variable test and selectivity ratio plot: quantitative tools for interpretation and variable (biomarker) selection in complex spectral or chromatographic profiles.判别变量测试和选择性比率图：用于解释复杂光谱或色谱图以及变量（生物标志物）选择的定量工具。

Anal Chem. 2009 Apr 1;81(7):2581-90. doi: 10.1021/ac802514y.

The Monte Carlo validation framework for the discriminant partial least squares model extended with variable selection methods applied to authenticity studies of Viagra® based on chromatographic impurity profiles.基于色谱杂质谱的判别偏最小二乘模型的蒙特卡罗验证框架，该模型通过变量选择方法进行扩展，应用于伟哥®的真伪研究。

Analyst. 2016 Feb 7;141(3):1060-70. doi: 10.1039/c5an01656h. Epub 2016 Jan 5.

Effects of nonlinearities and uncorrelated or correlated errors in realistic simulated data on the prediction abilities of augmented classical least squares and partial least squares.现实模拟数据中的非线性以及不相关或相关误差对增强经典最小二乘法和偏最小二乘法预测能力的影响。

Appl Spectrosc. 2004 Sep;58(9):1065-73. doi: 10.1366/0003702041959334.

引用本文的文献

An ensemble variable selection method for vibrational spectroscopic data analysis.一种用于振动光谱数据分析的集成变量选择方法。

RSC Adv. 2019 Feb 26;9(12):6708-6716. doi: 10.1039/c8ra08754g. eCollection 2019 Feb 22.

Transcriptome profiles discriminate between Gram-positive and Gram-negative sepsis in preterm neonates.转录组谱可区分早产儿革兰阳性和革兰阴性脓毒症。

Pediatr Res. 2022 Feb;91(3):637-645. doi: 10.1038/s41390-021-01444-3. Epub 2021 Mar 25.

Comparative Analysis of Chemical Constituents of Leaves from China and India by Ultra-Performance Liquid Chromatography Coupled with Quadrupole-Time-Of-Flight Mass Spectrometry.采用超高效液相色谱-四极杆飞行时间质谱联用技术对中印两国产枫香叶的化学成分进行比较分析。

Molecules. 2019 Mar 7;24(5):942. doi: 10.3390/molecules24050942.

Serum Metabolomics Analysis of Asthma in Different Inflammatory Phenotypes: A Cross-Sectional Study in Northeast China.血清代谢组学分析不同炎症表型的哮喘：中国东北地区的一项横断面研究。

Biomed Res Int. 2018 Sep 23;2018:2860521. doi: 10.1155/2018/2860521. eCollection 2018.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

运用偏最小二乘判别分析评估机遇相关性对变量选择的影响。

Evaluation of the effect of chance correlations on variable selection using Partial Least Squares-Discriminant Analysis.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献