• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

确定不完全数据集上偏最小二乘回归中的成分数量。

Determining the number of components in PLS regression on incomplete data set.

作者信息

Nengsih Titin Agustin, Bertrand Frédéric, Maumy-Bertrand Myriam, Meyer Nicolas

机构信息

IRMA, CNRS UMR 7501, Université de Strasbourg, 67084 Strasbourg, Cedex, France.

iCUBE, CNRS UMR 7357, Université de Strasbourg, 67400 Strasbourg, France.

出版信息

Stat Appl Genet Mol Biol. 2019 Nov 6;18(6):/j/sagmb.2019.18.issue-6/sagmb-2018-0059/sagmb-2018-0059.xml. doi: 10.1515/sagmb-2018-0059.

DOI:10.1515/sagmb-2018-0059
PMID:31693499
Abstract

Partial least squares regression - or PLS regression - is a multivariate method in which the model parameters are estimated using either the SIMPLS or NIPALS algorithm. PLS regression has been extensively used in applied research because of its effectiveness in analyzing relationships between an outcome and one or several components. Note that the NIPALS algorithm can provide estimates parameters on incomplete data. The selection of the number of components used to build a representative model in PLS regression is a central issue. However, how to deal with missing data when using PLS regression remains a matter of debate. Several approaches have been proposed in the literature, including the Q2 criterion, and the AIC and BIC criteria. Here we study the behavior of the NIPALS algorithm when used to fit a PLS regression for various proportions of missing data and different types of missingness. We compare criteria to select the number of components for a PLS regression on incomplete data set and on imputed data set using three imputation methods: multiple imputation by chained equations, k-nearest neighbour imputation, and singular value decomposition imputation. We tested various criteria with different proportions of missing data (ranging from 5% to 50%) under different missingness assumptions. Q2-leave-one-out component selection methods gave more reliable results than AIC and BIC-based ones.

摘要

偏最小二乘回归(PLS回归)是一种多元方法,其中模型参数使用SIMPLS或NIPALS算法进行估计。PLS回归因其在分析结果与一个或多个成分之间关系方面的有效性而在应用研究中得到广泛应用。请注意,NIPALS算法可以在不完整数据上提供参数估计。在PLS回归中选择用于构建代表性模型的成分数量是一个核心问题。然而,在使用PLS回归时如何处理缺失数据仍然存在争议。文献中已经提出了几种方法,包括Q2准则、AIC和BIC准则。在这里,我们研究了NIPALS算法在用于拟合具有不同比例缺失数据和不同类型缺失情况的PLS回归时的行为。我们比较了在不完整数据集和使用三种插补方法(链式方程多重插补、k近邻插补和奇异值分解插补)的插补数据集上选择PLS回归成分数量的准则。我们在不同缺失假设下测试了具有不同比例缺失数据(从5%到50%)的各种准则。Q2留一法成分选择方法比基于AIC和BIC的方法给出了更可靠的结果。

相似文献

1
Determining the number of components in PLS regression on incomplete data set.确定不完全数据集上偏最小二乘回归中的成分数量。
Stat Appl Genet Mol Biol. 2019 Nov 6;18(6):/j/sagmb.2019.18.issue-6/sagmb-2018-0059/sagmb-2018-0059.xml. doi: 10.1515/sagmb-2018-0059.
2
A nonparametric multiple imputation approach for missing categorical data.一种针对缺失分类数据的非参数多重填补方法。
BMC Med Res Methodol. 2017 Jun 6;17(1):87. doi: 10.1186/s12874-017-0360-2.
3
Robust imputation method for missing values in microarray data.微阵列数据中缺失值的稳健插补方法。
BMC Bioinformatics. 2007 May 3;8 Suppl 2(Suppl 2):S6. doi: 10.1186/1471-2105-8-S2-S6.
4
Dealing with gene expression missing data.处理基因表达缺失数据。
Syst Biol (Stevenage). 2006 May;153(3):105-19. doi: 10.1049/ip-syb:20050056.
5
Effects of nonlinearities and uncorrelated or correlated errors in realistic simulated data on the prediction abilities of augmented classical least squares and partial least squares.现实模拟数据中的非线性以及不相关或相关误差对增强经典最小二乘法和偏最小二乘法预测能力的影响。
Appl Spectrosc. 2004 Sep;58(9):1065-73. doi: 10.1366/0003702041959334.
6
An empirical comparison of some missing data treatments in PLS-SEM.PLS-SEM 中一些缺失数据处理方法的实证比较。
PLoS One. 2024 Jan 19;19(1):e0297037. doi: 10.1371/journal.pone.0297037. eCollection 2024.
7
Multiple imputation with missing data indicators.带有缺失数据指标的多重插补。
Stat Methods Med Res. 2021 Dec;30(12):2685-2700. doi: 10.1177/09622802211047346. Epub 2021 Oct 13.
8
Multiple imputation using chained equations for missing data in survival models: applied to multidrug-resistant tuberculosis and HIV data.生存模型中使用链式方程对缺失数据进行多重填补:应用于耐多药结核病和艾滋病毒数据
J Public Health Afr. 2023 Jun 5;14(8):2388. doi: 10.4081/jphia.2023.2388. eCollection 2023 Aug 7.
9
Boosting partial least squares.增强偏最小二乘法
Anal Chem. 2005 Mar 1;77(5):1423-31. doi: 10.1021/ac048561m.
10
Treatment of missing values for multivariate statistical analysis of gel-based proteomics data.基于凝胶的蛋白质组学数据多变量统计分析中缺失值的处理
Proteomics. 2008 Apr;8(7):1371-83. doi: 10.1002/pmic.200700975.

引用本文的文献

1
A Framework Integrating GWAS and Genomic Selection to Enhance Prediction Accuracy of Economical Traits in Common Carp.整合全基因组关联研究(GWAS)和基因组选择以提高鲤鱼经济性状预测准确性的框架
Int J Mol Sci. 2025 Jul 21;26(14):7009. doi: 10.3390/ijms26147009.
2
Evaluation of Low-Cost Multi-Spectral Sensors for Measuring Chlorophyll Levels Across Diverse Leaf Types.用于测量不同叶型叶绿素水平的低成本多光谱传感器评估
Sensors (Basel). 2025 Mar 31;25(7):2198. doi: 10.3390/s25072198.
3
Multivariate analysis applied to X-ray fluorescence to assess soil contamination pathways: case studies of mass magnetic susceptibility in soils near abandoned coal and W/Sn mines.
多元分析在 X 射线荧光分析中的应用评估土壤污染途径:以废弃煤矿和钨/锡矿附近土壤的质量磁化率为例。
Environ Geochem Health. 2024 May 2;46(6):202. doi: 10.1007/s10653-024-01988-3.
4
A community resource to mass explore the wheat grain proteome and its application to the late-maturity alpha-amylase (LMA) problem.一种用于大规模探索小麦谷蛋白组的社区资源及其在晚熟α-淀粉酶(LMA)问题上的应用。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad084. Epub 2023 Nov 1.
5
Cost-Effective Open-Ended Coaxial Technique for Liquid Food Characterization by Using the Reflection Method for Industrial Applications.采用反射法的经济型开放式同轴技术在工业应用中用于液体食品特性分析。
Sensors (Basel). 2022 Jul 14;22(14):5277. doi: 10.3390/s22145277.
6
Fitting and Cross-Validating Cox Models to Censored Big Data With Missing Values Using Extensions of Partial Least Squares Regression Models.使用偏最小二乘回归模型的扩展方法对带有缺失值的删失大数据进行Cox模型拟合和交叉验证
Front Big Data. 2021 Nov 1;4:684794. doi: 10.3389/fdata.2021.684794. eCollection 2021.
7
A Cross-Cultural Analysis of the Influence of Timbre on Affect Perception in Western Classical Music and Chinese Music Traditions.音色对西方古典音乐和中国音乐传统中情感感知影响的跨文化分析。
Front Psychol. 2021 Sep 29;12:732865. doi: 10.3389/fpsyg.2021.732865. eCollection 2021.
8
Formation Dominates Resorption With Increasing Mineralized Density and Time Postfracture in Cortical but Not Trabecular Bone: A Longitudinal HRpQCT Imaging Study in the Distal Radius.在皮质骨而非小梁骨中,随着骨折后矿化密度和时间的增加,形成主导吸收:桡骨远端的纵向高分辨率外周定量CT成像研究
JBMR Plus. 2021 Apr 8;5(6):e10493. doi: 10.1002/jbm4.10493. eCollection 2021 Jun.