Suppr超能文献

基于多重填补的变量选择模型及其在预测半数有效剂量和最大效应中的应用。

Variable selection models based on multiple imputation with an application for predicting median effective dose and maximum effect.

作者信息

Wan Y, Datta S, Conklin D J, Kong M

机构信息

Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, USA.

Division of Cardiovascular Medicine, Department of Medicine, University of Louisville, Louisville, KY, USA.

出版信息

J Stat Comput Simul. 2015;85(9):1902-1916. doi: 10.1080/00949655.2014.907801.

Abstract

The statistical methods for variable selection and prediction could be challenging when missing covariates exist. Although multiple imputation (MI) is a universally accepted technique for solving missing data problem, how to combine the MI results for variable selection is not quite clear, because different imputations may result in different selections. The widely applied variable selection methods include the sparse partial least-squares (SPLS) method and the penalized least-squares method, e.g. the elastic net (ENet) method. In this paper, we propose an MI-based weighted elastic net (MI-WENet) method that is based on stacked MI data and a weighting scheme for each observation in the stacked data set. In the MI-WENet method, MI accounts for sampling and imputation uncertainty for missing values, and the weight accounts for the observed information. Extensive numerical simulations are carried out to compare the proposed MI-WENet method with the other competing alternatives, such as the SPLS and ENet. In addition, we applied the MIWENet method to examine the predictor variables for the endothelial function that can be characterized by median effective dose (ED50) and maximum effect (Emax) in an ex-vivo phenylephrine-induced extension and acetylcholine-induced relaxation experiment.

摘要

当存在协变量缺失时,用于变量选择和预测的统计方法可能具有挑战性。尽管多重填补(MI)是解决缺失数据问题的一种普遍接受的技术,但如何将MI结果用于变量选择尚不完全清楚,因为不同的填补可能会导致不同的选择。广泛应用的变量选择方法包括稀疏偏最小二乘法(SPLS)和惩罚最小二乘法,例如弹性网络(ENet)法。在本文中,我们提出了一种基于MI的加权弹性网络(MI-WENet)方法,该方法基于堆叠的MI数据以及针对堆叠数据集中每个观测值的加权方案。在MI-WENet方法中,MI考虑了缺失值的抽样和填补不确定性,而权重则考虑了观测信息。我们进行了广泛的数值模拟,以将所提出的MI-WENet方法与其他竞争方法,如SPLS和ENet进行比较。此外,我们应用MIWENet方法来检验在体外去氧肾上腺素诱导的伸展和乙酰胆碱诱导的舒张实验中,可用半数有效剂量(ED50)和最大效应(Emax)表征的内皮功能的预测变量。

相似文献

2
A Comparison of Sparse Partial Least Squares and Elastic Net in Wavelength Selection on NIR Spectroscopy Data.
Int J Anal Chem. 2019 Aug 1;2019:7314916. doi: 10.1155/2019/7314916. eCollection 2019.
3
Variable selection under multiple imputation using the bootstrap in a prognostic study.
BMC Med Res Methodol. 2007 Jul 13;7:33. doi: 10.1186/1471-2288-7-33.
6
Variable selection for multiply-imputed data with application to dioxin exposure study.
Stat Med. 2013 Sep 20;32(21):3646-59. doi: 10.1002/sim.5783. Epub 2013 Mar 25.
7
Multiple imputation in veterinary epidemiological studies: a case study and simulation.
Prev Vet Med. 2016 Jul 1;129:35-47. doi: 10.1016/j.prevetmed.2016.04.003. Epub 2016 May 13.
8
VARIABLE SELECTION AND PREDICTION WITH INCOMPLETE HIGH-DIMENSIONAL DATA.
Ann Appl Stat. 2016 Mar;10(1):418-450. doi: 10.1214/15-AOAS899. Epub 2016 Mar 25.
9
Analyzing evidence-based falls prevention data with significant missing information using variable selection after multiple imputation.
J Appl Stat. 2021 Oct 7;50(3):724-743. doi: 10.1080/02664763.2021.1985090. eCollection 2023.
10
Model selection of generalized estimating equations with multiply imputed longitudinal data.
Biom J. 2013 Nov;55(6):899-911. doi: 10.1002/bimj.201200236. Epub 2013 Aug 23.

引用本文的文献

1
Using Machine Learning to Identify Social Determinants of Health that Impact Discharge Disposition for Hospitalized Patients.
J Am Med Dir Assoc. 2025 May;26(5):105524. doi: 10.1016/j.jamda.2025.105524. Epub 2025 Mar 20.
2
Biomarker Panel Development Using Logic Regression in the Presence of Missing Data.
N Engl J Stat Data Sci. 2024 Apr;2(1):3-14. doi: 10.51387/24-nejsds59. Epub 2024 Jan 31.
3
Place of care and death preferences among recently bereaved family members: a cross-sectional survey.
BMJ Support Palliat Care. 2024 Dec 19;14(e3):e2904-e2913. doi: 10.1136/spcare-2023-004697.
4
Multi-omics regulatory network inference in the presence of missing data.
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad309.
5
Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods.
J Comput Graph Stat. 2022;31(4):1063-1075. doi: 10.1080/10618600.2022.2035739. Epub 2022 Mar 28.
6
How to apply variable selection machine learning algorithms with multiply imputed data: A missing discussion.
Psychol Methods. 2023 Apr;28(2):452-471. doi: 10.1037/met0000478. Epub 2022 Feb 3.
7
A comparison of penalised regression methods for informing the selection of predictive markers.
PLoS One. 2020 Nov 20;15(11):e0242730. doi: 10.1371/journal.pone.0242730. eCollection 2020.
8
A comparison of model selection methods for prediction in the presence of multiply imputed data.
Biom J. 2019 Mar;61(2):343-356. doi: 10.1002/bimj.201700232. Epub 2018 Oct 23.
9
A Novel Strategy to Identify Placebo Responders: Prediction Index of Clinical and Biological Markers in the EMBARC Trial.
Psychother Psychosom. 2018;87(5):285-295. doi: 10.1159/000491093. Epub 2018 Aug 15.
10
Variable Selection in the Presence of Missing Data: Imputation-based Methods.
Wiley Interdiscip Rev Comput Stat. 2017 Sep-Oct;9(5). doi: 10.1002/wics.1402. Epub 2017 May 24.

本文引用的文献

1
Variable selection for multiply-imputed data with application to dioxin exposure study.
Stat Med. 2013 Sep 20;32(21):3646-59. doi: 10.1002/sim.5783. Epub 2013 Mar 25.
3
Sparse partial least squares regression for simultaneous dimension reduction and variable selection.
J R Stat Soc Series B Stat Methodol. 2010 Jan;72(1):3-25. doi: 10.1111/j.1467-9868.2009.00723.x.
4
Reduced NO-cGMP signaling contributes to vascular inflammation and insulin resistance induced by high-fat feeding.
Arterioscler Thromb Vasc Biol. 2010 Apr;30(4):758-65. doi: 10.1161/ATVBAHA.109.199893. Epub 2010 Jan 21.
5
Endothelial dysfunction as a target for prevention of cardiovascular disease.
Diabetes Care. 2009 Nov;32 Suppl 2(Suppl 2):S314-21. doi: 10.2337/dc09-S330.
6
Glutathione-S-transferase P protects against endothelial dysfunction induced by exposure to tobacco smoke.
Am J Physiol Heart Circ Physiol. 2009 May;296(5):H1586-97. doi: 10.1152/ajpheart.00867.2008. Epub 2009 Mar 6.
7
Vascular inflammation, insulin resistance, and reduced nitric oxide production precede the onset of peripheral insulin resistance.
Arterioscler Thromb Vasc Biol. 2008 Nov;28(11):1982-8. doi: 10.1161/ATVBAHA.108.169722. Epub 2008 Sep 4.
8
Reconstruction of genetic association networks from microarray data: a partial least squares approach.
Bioinformatics. 2008 Feb 15;24(4):561-8. doi: 10.1093/bioinformatics/btm640. Epub 2008 Jan 18.
9
How should variable selection be performed with multiply imputed data?
Stat Med. 2008 Jul 30;27(17):3227-46. doi: 10.1002/sim.3177.
10
Variable selection under multiple imputation using the bootstrap in a prognostic study.
BMC Med Res Methodol. 2007 Jul 13;7:33. doi: 10.1186/1471-2288-7-33.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验