那么，你认为你可以进行 PLS-DA 分析吗？

So you think you can PLS-DA?

机构信息

Bioinformatics Research Group (BioRG), Florida International University, 11200 SW 8th St, Miami, 33199, FL, USA.

Department of Epidemiology, Florida International University, 11200 SW 8th St, Miami, 24105, FL, USA.

出版信息

BMC Bioinformatics. 2020 Dec 9;21(Suppl 1):2. doi: 10.1186/s12859-019-3310-7.

DOI:10.1186/s12859-019-3310-7

PMID:33297937

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7724830/

Abstract

BACKGROUND

Partial Least-Squares Discriminant Analysis (PLS-DA) is a popular machine learning tool that is gaining increasing attention as a useful feature selector and classifier. In an effort to understand its strengths and weaknesses, we performed a series of experiments with synthetic data and compared its performance to its close relative from which it was initially invented, namely Principal Component Analysis (PCA).

RESULTS

We demonstrate that even though PCA ignores the information regarding the class labels of the samples, this unsupervised tool can be remarkably effective as a feature selector. In some cases, it outperforms PLS-DA, which is made aware of the class labels in its input. Our experiments range from looking at the signal-to-noise ratio in the feature selection task, to considering many practical distributions and models encountered when analyzing bioinformatics and clinical data. Other methods were also evaluated. Finally, we analyzed an interesting data set from 396 vaginal microbiome samples where the ground truth for the feature selection was available. All the 3D figures shown in this paper as well as the supplementary ones can be viewed interactively at http://biorg.cs.fiu.edu/plsda CONCLUSIONS: Our results highlighted the strengths and weaknesses of PLS-DA in comparison with PCA for different underlying data models.

摘要

背景

偏最小二乘判别分析（PLS-DA）是一种流行的机器学习工具，作为一种有用的特征选择器和分类器，越来越受到关注。为了了解它的优缺点，我们用合成数据进行了一系列实验，并将其性能与其最初发明的近亲主成分分析（PCA）进行了比较。

结果

我们证明，尽管 PCA 忽略了样本类标签的信息，但作为一种特征选择器，这种无监督工具可以非常有效。在某些情况下，它的性能优于 PLS-DA，后者在输入中了解类标签。我们的实验范围从特征选择任务中的信噪比，到考虑分析生物信息学和临床数据时遇到的许多实际分布和模型。还评估了其他方法。最后，我们分析了一个有趣的来自 396 个阴道微生物组样本的数据集，其中特征选择的真实情况是可用的。本文显示的所有 3D 图以及补充图都可以在 http://biorg.cs.fiu.edu/plsda 上交互式查看。

结论

我们的结果突出了 PLS-DA 与 PCA 相比在不同基础数据模型下的优缺点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c5a/7724830/3b17c63a70bd/12859_2019_3310_Fig1_HTML.jpg

相似文献

So you think you can PLS-DA?那么，你认为你可以进行 PLS-DA 分析吗？

BMC Bioinformatics. 2020 Dec 9;21(Suppl 1):2. doi: 10.1186/s12859-019-3310-7.

Classification of structurally related commercial contrast media by near infrared spectroscopy.通过近红外光谱法对结构相关的商业造影剂进行分类。

J Pharm Biomed Anal. 2014 Mar;90:148-60. doi: 10.1016/j.jpba.2013.11.033. Epub 2013 Dec 7.

A tutorial review: Metabolomics and partial least squares-discriminant analysis--a marriage of convenience or a shotgun wedding.一篇教程综述：代谢组学与偏最小二乘判别分析——是权宜结合还是仓促结合。

Anal Chim Acta. 2015 Jun 16;879:10-23. doi: 10.1016/j.aca.2015.02.012. Epub 2015 Feb 11.

Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis.在微阵列数据分析中从主成分分析（PCA）和偏最小二乘法（PLS）中选择新提取特征的子集。

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S24. doi: 10.1186/1471-2164-9-S2-S24.

Approaches to Sample Size Determination for Multivariate Data: Applications to PCA and PLS-DA of Omics Data.多元数据样本量确定方法：在组学数据主成分分析和偏最小二乘判别分析中的应用

J Proteome Res. 2016 Aug 5;15(8):2379-93. doi: 10.1021/acs.jproteome.5b01029. Epub 2016 Jul 7.

Overoptimism in cross-validation when using partial least squares-discriminant analysis for omics data: a systematic study.使用偏最小二乘判别分析进行组学数据分析时，交叉验证中的过度乐观：一项系统研究。

Anal Bioanal Chem. 2018 Sep;410(23):5981-5992. doi: 10.1007/s00216-018-1217-1. Epub 2018 Jun 29.

Scores selection via Fisher's discriminant power in PCA-LDA to improve the classification of food data.通过 PCA-LDA 中的 Fisher 判别力进行评分选择，以提高食品数据的分类。

Food Chem. 2021 Nov 30;363:130296. doi: 10.1016/j.foodchem.2021.130296. Epub 2021 Jun 5.

Development and validation of a Partial Least Squares-Discriminant Analysis (PLS-DA) model based on the determination of ethyl glucuronide (EtG) and fatty acid ethyl esters (FAEEs) in hair for the diagnosis of chronic alcohol abuse.基于毛发中葡萄糖醛酸乙酯（EtG）和脂肪酸乙酯（FAEEs）测定的偏最小二乘判别分析（PLS-DA）模型的开发与验证，用于慢性酒精滥用的诊断。

Forensic Sci Int. 2018 Jan;282:221-230. doi: 10.1016/j.forsciint.2017.11.010. Epub 2017 Nov 12.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学：基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

Complex Chemical Data Classification and Discrimination Using Locality Preserving Partial Least Squares Discriminant Analysis.使用局部保留偏最小二乘判别分析的复杂化学数据分类与判别

ACS Omega. 2020 Oct 9;5(41):26601-26610. doi: 10.1021/acsomega.0c03362. eCollection 2020 Oct 20.

引用本文的文献

Regulatory Effects of Alkali-Extracted Polysaccharide Induced Intestinal Enrichment on Peripheral Blood Proteomics in Tumor-Bearing Mice.碱提取多糖诱导肠道富集对荷瘤小鼠外周血蛋白质组学的调控作用

Microorganisms. 2025 Jul 26;13(8):1750. doi: 10.3390/microorganisms13081750.

Pre-diagnostic serum metabolome and breast cancer risk: a nested case-control study.诊断前血清代谢组与乳腺癌风险：一项巢式病例对照研究

Breast Cancer Res. 2025 Aug 27;27(1):156. doi: 10.1186/s13058-025-02102-w.

Lipidomics analysis of phospholipid profiles and oxidative stability in pan-fried beef patties incorporating sacha inchi leaf extracts.对添加印加果叶提取物的煎牛肉饼中磷脂谱和氧化稳定性的脂质组学分析。

Sci Rep. 2025 Aug 2;15(1):28233. doi: 10.1038/s41598-025-13267-x.

Mass Spectrometric Fingerprinting to Detect Fraud and Herbal Adulteration in Plant Food Supplements.用于检测植物性食品补充剂中欺诈行为和草药掺假的质谱指纹图谱法。

Molecules. 2025 Jul 17;30(14):3001. doi: 10.3390/molecules30143001.

Machine learning model interpretability using SHAP values: Applied to the task of classifying and predicting the nutritional content of different cuts of mutton.使用SHAP值的机器学习模型可解释性：应用于不同部位羊肉营养成分的分类和预测任务。

Food Chem X. 2025 Jul 4;29:102739. doi: 10.1016/j.fochx.2025.102739. eCollection 2025 Jul.

A murine model lacking Lyst recapitulates Chediak-Higashi syndrome with an earlier-onset neurodegenerative phenotype.缺乏溶酶体运输调节蛋白的小鼠模型再现了具有早发性神经退行性表型的切-东综合征。

Commun Biol. 2025 Jul 18;8(1):1064. doi: 10.1038/s42003-025-08482-1.

Differential intestinal microbiome response to heat stress in two rabbit maternal lines: a comparative analysis using Random Forest, BayesC, and PLS-DA.两个家兔母系中肠道微生物群对热应激的差异反应：使用随机森林、贝叶斯C和偏最小二乘判别分析的比较分析

J Anim Sci. 2025 Jan 4;103. doi: 10.1093/jas/skaf206.

On Selecting Robust Approaches for Learning Predictive Biomarkers in Metabolomics Data Sets.关于选择稳健方法以在代谢组学数据集中学习预测性生物标志物

Anal Chem. 2025 Jun 24;97(24):12669-12678. doi: 10.1021/acs.analchem.5c01049. Epub 2025 Jun 12.

Study on Liver Sinusoidal Endothelial Cell Fenestrations Based on Cellular Omics-Structure Integration Technology and Its Application in Metabolic Diseases.基于细胞组学-结构整合技术的肝窦内皮细胞窗孔研究及其在代谢性疾病中的应用

bioRxiv. 2025 May 19:2025.05.16.653525. doi: 10.1101/2025.05.16.653525.

Multi-omics profiling of cross-resistance between ceftazidime-avibactam and meropenem identifies common and strain-specific mechanisms in clinical isolates.头孢他啶-阿维巴坦与美罗培南交叉耐药性的多组学分析确定了临床分离株中的共同机制和菌株特异性机制。

mBio. 2025 Jul 9;16(7):e0389624. doi: 10.1128/mbio.03896-24. Epub 2025 Jun 4.

本文引用的文献

mixOmics: An R package for 'omics feature selection and multiple data integration.mixOmics：一个用于“组学”特征选择和多数据整合的R包。

PLoS Comput Biol. 2017 Nov 3;13(11):e1005752. doi: 10.1371/journal.pcbi.1005752. eCollection 2017 Nov.

Quantifying the human vaginal community state types (CSTs) with the species specificity index.用物种特异性指数量化人类阴道群落状态类型（CSTs）。

PeerJ. 2017 Jun 27;5:e3366. doi: 10.7717/peerj.3366. eCollection 2017.

J Proteome Res. 2016 Aug 5;15(8):2379-93. doi: 10.1021/acs.jproteome.5b01029. Epub 2016 Jul 7.

Microbial ecosystems are dominated by specialist taxa.微生物生态系统由专性分类群主导。

Ecol Lett. 2015 Sep;18(9):974-82. doi: 10.1111/ele.12478. Epub 2015 Aug 6.

Multivariate Analysis in Metabolomics.代谢组学中的多变量分析

Curr Metabolomics. 2013;1(1):92-107. doi: 10.2174/2213235X11301010092.

A critical assessment of feature selection methods for biomarker discovery in clinical proteomics.临床蛋白质组学中生物标志物发现的特征选择方法的批判性评估。

Mol Cell Proteomics. 2013 Jan;12(1):263-76. doi: 10.1074/mcp.M112.022566. Epub 2012 Oct 31.

Utilities for quantifying separation in PCA/PLS-DA scores plots.用于量化 PCA/PLS-DA 得分图中分离程度的实用程序。

Anal Biochem. 2013 Feb 15;433(2):102-4. doi: 10.1016/j.ab.2012.10.011. Epub 2012 Oct 15.

Vaginal microbiome: rethinking health and disease.阴道微生物组：重新思考健康与疾病。

Annu Rev Microbiol. 2012;66:371-89. doi: 10.1146/annurev-micro-092611-150157. Epub 2012 Jun 28.

Temporal dynamics of the human vaginal microbiota.人类阴道微生物组的时间动态变化。

Sci Transl Med. 2012 May 2;4(132):132ra52. doi: 10.1126/scitranslmed.3003605.

Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems.稀疏偏最小二乘判别分析：用于多类问题的生物学相关特征选择和图形显示。

BMC Bioinformatics. 2011 Jun 22;12:253. doi: 10.1186/1471-2105-12-253.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

那么，你认为你可以进行 PLS-DA 分析吗？

So you think you can PLS-DA?

机构信息

出版信息

BACKGROUND

RESULTS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献