• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

模型自适应空间收缩(MASS)方法:一种基于模型群体分析的同时变量选择和异常值检测的新方法。

The model adaptive space shrinkage (MASS) approach: a new method for simultaneous variable selection and outlier detection based on model population analysis.

机构信息

School of Pharmaceutical Sciences, Central South University, Changsha 410013, PR China.

出版信息

Analyst. 2016 Oct 7;141(19):5586-97. doi: 10.1039/c6an00764c. Epub 2016 Jul 20.

DOI:10.1039/c6an00764c
PMID:27435388
Abstract

Variable selection and outlier detection are important processes in chemical modeling. Usually, they affect each other. Their performing orders also strongly affect the modeling results. Currently, many studies perform these processes separately and in different orders. In this study, we examined the interaction between outliers and variables and compared the modeling procedures performed with different orders of variable selection and outlier detection. Because the order of outlier detection and variable selection can affect the interpretation of the model, it is difficult to decide which order is preferable when the predictabilities (prediction error) of the different orders are relatively close. To address this problem, a simultaneous variable selection and outlier detection approach called Model Adaptive Space Shrinkage (MASS) was developed. This proposed approach is based on model population analysis (MPA). Through weighted binary matrix sampling (WBMS) from model space, a large number of partial least square (PLS) regression models were built, and the elite parts of the models were selected to statistically reassign the weight of each variable and sample. Then, the whole process was repeated until the weights of the variables and samples converged. Finally, MASS adaptively found a high performance model which consisted of the optimized variable subset and sample subset. The combination of these two subsets could be considered as the cleaned dataset used for chemical modeling. In the proposed approach, the problem of the order of variable selection and outlier detection is avoided. One near infrared spectroscopy (NIR) dataset and one quantitative structure-activity relationship (QSAR) dataset were used to test this approach. The result demonstrated that MASS is a useful method for data cleaning before building a predictive model.

摘要

变量选择和异常值检测是化学建模中的重要过程。通常,它们相互影响。它们的执行顺序也强烈影响建模结果。目前,许多研究分别以不同的顺序执行这些过程。在这项研究中,我们检查了异常值和变量之间的相互作用,并比较了以不同变量选择和异常值检测顺序执行的建模过程。由于异常值检测和变量选择的顺序会影响模型的解释,因此当不同顺序的可预测性(预测误差)相对接近时,很难决定哪种顺序更可取。为了解决这个问题,我们开发了一种称为模型自适应空间收缩(MASS)的同时变量选择和异常值检测方法。该方法基于模型群体分析(MPA)。通过从模型空间进行加权二进制矩阵抽样(WBMS),构建了大量偏最小二乘(PLS)回归模型,并选择模型的精英部分来对每个变量和样本的权重进行统计重新分配和抽样。然后,重复整个过程,直到变量和样本的权重收敛。最后,MASS 自适应地找到了一个高性能模型,该模型由优化的变量子集和样本子集组成。这两个子集的组合可以被认为是用于化学建模的清洁数据集。在该方法中,避免了变量选择和异常值检测顺序的问题。我们使用一个近红外光谱(NIR)数据集和一个定量构效关系(QSAR)数据集来测试该方法。结果表明,MASS 是在建立预测模型之前进行数据清理的有用方法。

相似文献

1
The model adaptive space shrinkage (MASS) approach: a new method for simultaneous variable selection and outlier detection based on model population analysis.模型自适应空间收缩(MASS)方法:一种基于模型群体分析的同时变量选择和异常值检测的新方法。
Analyst. 2016 Oct 7;141(19):5586-97. doi: 10.1039/c6an00764c. Epub 2016 Jul 20.
2
A bootstrapping soft shrinkage approach for variable selection in chemical modeling.一种用于化学建模中变量选择的自举软收缩方法。
Anal Chim Acta. 2016 Feb 18;908:63-74. doi: 10.1016/j.aca.2016.01.001. Epub 2016 Jan 7.
3
A modification of the bootstrapping soft shrinkage approach for spectral variable selection in the issue of over-fitting, model accuracy and variable selection credibility.在过拟合、模型准确性和变量选择可信度问题中,对自举软收缩方法进行了改进,以进行光谱变量选择。
Spectrochim Acta A Mol Biomol Spectrosc. 2019 Mar 5;210:362-371. doi: 10.1016/j.saa.2018.10.034. Epub 2018 Oct 24.
4
[Quantitative analysis method of natural gas combustion process combining wavelength selection and outlier spectra detection].结合波长选择与异常光谱检测的天然气燃烧过程定量分析方法
Guang Pu Xue Yu Guang Pu Fen Xi. 2012 Oct;32(10):2799-804.
5
Simultaneous wavelength selection and outlier detection in multivariate regression of near-infrared spectra.近红外光谱多元回归中的同步波长选择与异常值检测
Anal Sci. 2005 Feb;21(2):161-6. doi: 10.2116/analsci.21.161.
6
A novel variable selection approach that iteratively optimizes variable space using weighted binary matrix sampling.一种新颖的变量选择方法,该方法使用加权二元矩阵采样迭代优化变量空间。
Analyst. 2014 Oct 7;139(19):4836-45. doi: 10.1039/c4an00730a.
7
Comparison of methods for the detection of outliers and associated biomarkers in mislabeled omics data.比较用于检测组学数据中标记错误的异常值和相关生物标志物的方法。
BMC Bioinformatics. 2020 Aug 14;21(1):357. doi: 10.1186/s12859-020-03653-9.
8
Toward better QSAR/QSPR modeling: simultaneous outlier detection and variable selection using distribution of model features.为了更好地进行 QSAR/QSPR 建模:使用模型特征分布同时进行异常值检测和变量选择。
J Comput Aided Mol Des. 2011 Jan;25(1):67-80. doi: 10.1007/s10822-010-9401-1. Epub 2010 Nov 13.
9
Importance of prediction outlier diagnostics in determining a successful inter-vendor multivariate calibration model transfer.预测异常值诊断在确定成功的供应商间多变量校准模型转移中的重要性。
Appl Spectrosc. 2007 Jul;61(7):747-54. doi: 10.1366/000370207781393280.
10
Outlier detection and robust variable selection via the penalized weighted LAD-LASSO method.通过惩罚加权最小绝对偏差-套索方法进行异常值检测和稳健变量选择
J Appl Stat. 2020 Feb 4;48(2):234-246. doi: 10.1080/02664763.2020.1722079. eCollection 2021.

引用本文的文献

1
Rapid and Quantitative Prediction of Tea Pigments Content During the Rolling of Black Tea by Multi-Source Information Fusion and System Analysis Methods.基于多源信息融合与系统分析方法的红茶揉捻过程中茶色素含量快速定量预测
Foods. 2025 Aug 15;14(16):2829. doi: 10.3390/foods14162829.
2
Monitoring the major taste components during black tea fermentation using multielement fusion information in decision level.利用决策层中的多元素融合信息监测红茶发酵过程中的主要滋味成分。
Food Chem X. 2023 May 22;18:100718. doi: 10.1016/j.fochx.2023.100718. eCollection 2023 Jun 30.
3
Non-destructive detection of kiwifruit soluble solid content based on hyperspectral and fluorescence spectral imaging.
基于高光谱和荧光光谱成像的猕猴桃可溶性固形物含量无损检测
Front Plant Sci. 2023 Jan 18;13:1075929. doi: 10.3389/fpls.2022.1075929. eCollection 2022.
4
Analysis of near infrared spectra for age-grading of wild populations of Anopheles gambiae.分析野生冈比亚按蚊种群的近红外光谱以进行年龄分级。
Parasit Vectors. 2017 Nov 7;10(1):552. doi: 10.1186/s13071-017-2501-1.