块坐标下降算法可改善误差变量回归中的变量选择和估计。

Block coordinate descent algorithm improves variable selection and estimation in error-in-variables regression.

机构信息

Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Québec, Canada.

Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA, United States.

出版信息

Genet Epidemiol. 2021 Dec;45(8):874-890. doi: 10.1002/gepi.22430. Epub 2021 Sep 1.

DOI:10.1002/gepi.22430

PMID:34468045

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9292988/

Abstract

Medical research increasingly includes high-dimensional regression modeling with a need for error-in-variables methods. The Convex Conditioned Lasso (CoCoLasso) utilizes a reformulated Lasso objective function and an error-corrected cross-validation to enable error-in-variables regression, but requires heavy computations. Here, we develop a Block coordinate Descent Convex Conditioned Lasso (BDCoCoLasso) algorithm for modeling high-dimensional data that are only partially corrupted by measurement error. This algorithm separately optimizes the estimation of the uncorrupted and corrupted features in an iterative manner to reduce computational cost, with a specially calibrated formulation of cross-validation error. Through simulations, we show that the BDCoCoLasso algorithm successfully copes with much larger feature sets than CoCoLasso, and as expected, outperforms the naïve Lasso with enhanced estimation accuracy and consistency, as the intensity and complexity of measurement errors increase. Also, a new smoothly clipped absolute deviation penalization option is added that may be appropriate for some data sets. We apply the BDCoCoLasso algorithm to data selected from the UK Biobank. We develop and showcase the utility of covariate-adjusted genetic risk scores for body mass index, bone mineral density, and lifespan. We demonstrate that by leveraging more information than the naïve Lasso in partially corrupted data, the BDCoCoLasso may achieve higher prediction accuracy. These innovations, together with an R package, BDCoCoLasso, make error-in-variables adjustments more accessible for high-dimensional data sets. We posit the BDCoCoLasso algorithm has the potential to be widely applied in various fields, including genomics-facilitated personalized medicine research.

摘要

医学研究越来越多地包含具有变量误差方法需求的高维回归建模。凸条件套索（CoCoLasso）利用重新制定的套索目标函数和错误校正的交叉验证来实现变量误差回归，但需要大量计算。在这里，我们开发了一种用于建模仅部分受测量误差影响的高维数据的块坐标下降凸条件套索（BDCoCoLasso）算法。该算法以迭代方式分别优化未受干扰和受干扰特征的估计，以降低计算成本，并对交叉验证误差进行特别校准。通过模拟，我们表明 BDCoCoLasso 算法成功应对比 CoCoLasso 大得多的特征集，并且如预期的那样，随着测量误差的强度和复杂性的增加，它通过提高估计准确性和一致性，优于朴素套索。此外，还添加了一个新的平滑剪辑绝对偏差惩罚选项，该选项可能适用于某些数据集。我们将 BDCoCoLasso 算法应用于从英国生物银行中选择的数据。我们开发并展示了用于体重指数、骨矿物质密度和寿命的协变量调整遗传风险评分的实用性。我们证明，通过利用部分受干扰数据中比朴素套索更多的信息，BDCoCoLasso 可以实现更高的预测准确性。这些创新，连同一个 R 包 BDCoCoLasso，使高维数据集的变量误差调整更容易获得。我们假设 BDCoCoLasso 算法具有在各种领域广泛应用的潜力，包括基因组学促进的个性化医疗研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d878/9292988/abba2a9354c7/GEPI-45-874-g006.jpg

相似文献

Block coordinate descent algorithm improves variable selection and estimation in error-in-variables regression.

Genet Epidemiol. 2021 Dec;45(8):874-890. doi: 10.1002/gepi.22430. Epub 2021 Sep 1.

Adaptive CoCoLasso for High-Dimensional Measurement Error Models.

Entropy (Basel). 2025 Jan 21;27(2):97. doi: 10.3390/e27020097.

MEBoost: Variable selection in the presence of measurement error.

Stat Med. 2019 Jul 10;38(15):2705-2718. doi: 10.1002/sim.8130. Epub 2019 Mar 11.

HighDimMixedModels.jl: Robust high-dimensional mixed-effects models across omics data.

PLoS Comput Biol. 2025 Jan 13;21(1):e1012143. doi: 10.1371/journal.pcbi.1012143. eCollection 2025 Jan.

Simultaneous channel and feature selection of fused EEG features based on Sparse Group Lasso.

Biomed Res Int. 2015;2015:703768. doi: 10.1155/2015/703768. Epub 2015 Feb 24.

A Semismooth Newton Algorithm for High-Dimensional Nonconvex Sparse Learning.

IEEE Trans Neural Netw Learn Syst. 2020 Aug;31(8):2993-3006. doi: 10.1109/TNNLS.2019.2935001. Epub 2019 Sep 12.

A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank.

PLoS Genet. 2020 Oct 23;16(10):e1009141. doi: 10.1371/journal.pgen.1009141. eCollection 2020 Oct.

High-dimensional Cox models: the choice of penalty as part of the model building process.

Biom J. 2010 Feb;52(1):50-69. doi: 10.1002/bimj.200900064.

Variable selection for zero-inflated and overdispersed data with application to health care demand in Germany.

Biom J. 2015 Sep;57(5):867-84. doi: 10.1002/bimj.201400143. Epub 2015 Jun 8.

Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.

BMC Bioinformatics. 2011 May 9;12:138. doi: 10.1186/1471-2105-12-138.

引用本文的文献

Applying a logistic regression-clustering joint model to analyze the causes of prolonged pre-analytic turnaround time for urine culture testing in hospital wards.

Front Digit Health. 2025 Jun 30;7:1603314. doi: 10.3389/fdgth.2025.1603314. eCollection 2025.

Toward whole-genome inference of polygenic scores with fast and memory-efficient algorithms.

Am J Hum Genet. 2025 May 20. doi: 10.1016/j.ajhg.2025.05.002.

Multi-omics regulatory network inference in the presence of missing data.

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad309.

Genetic determinants of polygenic prediction accuracy within a population.

Genetics. 2022 Nov 30;222(4). doi: 10.1093/genetics/iyac158.

本文引用的文献

A Polygenic Risk Score to Predict Future Adult Short Stature Among Children.

J Clin Endocrinol Metab. 2021 Jun 16;106(7):1918-1928. doi: 10.1210/clinem/dgab215.

Improved prediction of fracture risk leveraging a genome-wide polygenic risk score.

Genome Med. 2021 Feb 3;13(1):16. doi: 10.1186/s13073-021-00838-6.

Individuals with common diseases but with a low polygenic risk score could be prioritized for rare variant screening.

Genet Med. 2021 Mar;23(3):508-515. doi: 10.1038/s41436-020-01007-7. Epub 2020 Oct 28.

A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank.

PLoS Genet. 2020 Oct 23;16(10):e1009141. doi: 10.1371/journal.pgen.1009141. eCollection 2020 Oct.

A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank.

Am J Hum Genet. 2020 Aug 6;107(2):222-233. doi: 10.1016/j.ajhg.2020.06.003. Epub 2020 Jun 25.

Polygenic risk for coronary heart disease acts through atherosclerosis in type 2 diabetes.

Cardiovasc Diabetol. 2020 Jan 30;19(1):12. doi: 10.1186/s12933-020-0988-9.

A resource-efficient tool for mixed model association analysis of large-scale data.

Nat Genet. 2019 Dec;51(12):1749-1755. doi: 10.1038/s41588-019-0530-8. Epub 2019 Nov 25.

A meta-analysis of genome-wide association studies identifies multiple longevity genes.

Nat Commun. 2019 Aug 14;10(1):3669. doi: 10.1038/s41467-019-11558-2.

MEBoost: Variable selection in the presence of measurement error.

Stat Med. 2019 Jul 10;38(15):2705-2718. doi: 10.1002/sim.8130. Epub 2019 Mar 11.

Genomics of 1 million parent lifespans implicates novel pathways and common diseases and distinguishes survival chances.

Elife. 2019 Jan 15;8:e39856. doi: 10.7554/eLife.39856.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

块坐标下降算法可改善误差变量回归中的变量选择和估计。

Block coordinate descent algorithm improves variable selection and estimation in error-in-variables regression.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献