• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

块坐标下降算法可改善误差变量回归中的变量选择和估计。

Block coordinate descent algorithm improves variable selection and estimation in error-in-variables regression.

机构信息

Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Québec, Canada.

Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA, United States.

出版信息

Genet Epidemiol. 2021 Dec;45(8):874-890. doi: 10.1002/gepi.22430. Epub 2021 Sep 1.

DOI:10.1002/gepi.22430
PMID:34468045
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9292988/
Abstract

Medical research increasingly includes high-dimensional regression modeling with a need for error-in-variables methods. The Convex Conditioned Lasso (CoCoLasso) utilizes a reformulated Lasso objective function and an error-corrected cross-validation to enable error-in-variables regression, but requires heavy computations. Here, we develop a Block coordinate Descent Convex Conditioned Lasso (BDCoCoLasso) algorithm for modeling high-dimensional data that are only partially corrupted by measurement error. This algorithm separately optimizes the estimation of the uncorrupted and corrupted features in an iterative manner to reduce computational cost, with a specially calibrated formulation of cross-validation error. Through simulations, we show that the BDCoCoLasso algorithm successfully copes with much larger feature sets than CoCoLasso, and as expected, outperforms the naïve Lasso with enhanced estimation accuracy and consistency, as the intensity and complexity of measurement errors increase. Also, a new smoothly clipped absolute deviation penalization option is added that may be appropriate for some data sets. We apply the BDCoCoLasso algorithm to data selected from the UK Biobank. We develop and showcase the utility of covariate-adjusted genetic risk scores for body mass index, bone mineral density, and lifespan. We demonstrate that by leveraging more information than the naïve Lasso in partially corrupted data, the BDCoCoLasso may achieve higher prediction accuracy. These innovations, together with an R package, BDCoCoLasso, make error-in-variables adjustments more accessible for high-dimensional data sets. We posit the BDCoCoLasso algorithm has the potential to be widely applied in various fields, including genomics-facilitated personalized medicine research.

摘要

医学研究越来越多地包含具有变量误差方法需求的高维回归建模。凸条件套索(CoCoLasso)利用重新制定的套索目标函数和错误校正的交叉验证来实现变量误差回归,但需要大量计算。在这里,我们开发了一种用于建模仅部分受测量误差影响的高维数据的块坐标下降凸条件套索(BDCoCoLasso)算法。该算法以迭代方式分别优化未受干扰和受干扰特征的估计,以降低计算成本,并对交叉验证误差进行特别校准。通过模拟,我们表明 BDCoCoLasso 算法成功应对比 CoCoLasso 大得多的特征集,并且如预期的那样,随着测量误差的强度和复杂性的增加,它通过提高估计准确性和一致性,优于朴素套索。此外,还添加了一个新的平滑剪辑绝对偏差惩罚选项,该选项可能适用于某些数据集。我们将 BDCoCoLasso 算法应用于从英国生物银行中选择的数据。我们开发并展示了用于体重指数、骨矿物质密度和寿命的协变量调整遗传风险评分的实用性。我们证明,通过利用部分受干扰数据中比朴素套索更多的信息,BDCoCoLasso 可以实现更高的预测准确性。这些创新,连同一个 R 包 BDCoCoLasso,使高维数据集的变量误差调整更容易获得。我们假设 BDCoCoLasso 算法具有在各种领域广泛应用的潜力,包括基因组学促进的个性化医疗研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d878/9292988/a5a4de672960/GEPI-45-874-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d878/9292988/abba2a9354c7/GEPI-45-874-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d878/9292988/837e63c28677/GEPI-45-874-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d878/9292988/0697b681fd46/GEPI-45-874-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d878/9292988/a907455e0966/GEPI-45-874-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d878/9292988/a38e32126a5d/GEPI-45-874-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d878/9292988/986fc63a8bd3/GEPI-45-874-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d878/9292988/a5a4de672960/GEPI-45-874-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d878/9292988/abba2a9354c7/GEPI-45-874-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d878/9292988/837e63c28677/GEPI-45-874-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d878/9292988/0697b681fd46/GEPI-45-874-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d878/9292988/a907455e0966/GEPI-45-874-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d878/9292988/a38e32126a5d/GEPI-45-874-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d878/9292988/986fc63a8bd3/GEPI-45-874-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d878/9292988/a5a4de672960/GEPI-45-874-g004.jpg

相似文献

1
Block coordinate descent algorithm improves variable selection and estimation in error-in-variables regression.块坐标下降算法可改善误差变量回归中的变量选择和估计。
Genet Epidemiol. 2021 Dec;45(8):874-890. doi: 10.1002/gepi.22430. Epub 2021 Sep 1.
2
Adaptive CoCoLasso for High-Dimensional Measurement Error Models.用于高维测量误差模型的自适应协同套索法
Entropy (Basel). 2025 Jan 21;27(2):97. doi: 10.3390/e27020097.
3
MEBoost: Variable selection in the presence of measurement error.MEBoost:存在测量误差时的变量选择。
Stat Med. 2019 Jul 10;38(15):2705-2718. doi: 10.1002/sim.8130. Epub 2019 Mar 11.
4
HighDimMixedModels.jl: Robust high-dimensional mixed-effects models across omics data.HighDimMixedModels.jl:跨组学数据的稳健高维混合效应模型。
PLoS Comput Biol. 2025 Jan 13;21(1):e1012143. doi: 10.1371/journal.pcbi.1012143. eCollection 2025 Jan.
5
Simultaneous channel and feature selection of fused EEG features based on Sparse Group Lasso.基于稀疏组套索的融合脑电特征同步通道与特征选择
Biomed Res Int. 2015;2015:703768. doi: 10.1155/2015/703768. Epub 2015 Feb 24.
6
A Semismooth Newton Algorithm for High-Dimensional Nonconvex Sparse Learning.一种用于高维非凸稀疏学习的半光滑牛顿算法。
IEEE Trans Neural Netw Learn Syst. 2020 Aug;31(8):2993-3006. doi: 10.1109/TNNLS.2019.2935001. Epub 2019 Sep 12.
7
A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank.一种快速且可扩展的大规模超高维稀疏回归框架及其在 UK Biobank 中的应用。
PLoS Genet. 2020 Oct 23;16(10):e1009141. doi: 10.1371/journal.pgen.1009141. eCollection 2020 Oct.
8
High-dimensional Cox models: the choice of penalty as part of the model building process.高维Cox模型:作为模型构建过程一部分的惩罚项选择
Biom J. 2010 Feb;52(1):50-69. doi: 10.1002/bimj.200900064.
9
Variable selection for zero-inflated and overdispersed data with application to health care demand in Germany.针对零膨胀和过度分散数据的变量选择及其在德国医疗保健需求中的应用
Biom J. 2015 Sep;57(5):867-84. doi: 10.1002/bimj.201400143. Epub 2015 Jun 8.
10
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.弹性 SCAD 作为一种新的惩罚方法,用于高维数据中的 SVM 分类任务。
BMC Bioinformatics. 2011 May 9;12:138. doi: 10.1186/1471-2105-12-138.

引用本文的文献

1
Applying a logistic regression-clustering joint model to analyze the causes of prolonged pre-analytic turnaround time for urine culture testing in hospital wards.应用逻辑回归-聚类联合模型分析医院病房尿液培养检测前分析周转时间延长的原因。
Front Digit Health. 2025 Jun 30;7:1603314. doi: 10.3389/fdgth.2025.1603314. eCollection 2025.
2
Toward whole-genome inference of polygenic scores with fast and memory-efficient algorithms.使用快速且内存高效的算法进行多基因评分的全基因组推断。
Am J Hum Genet. 2025 May 20. doi: 10.1016/j.ajhg.2025.05.002.
3
Multi-omics regulatory network inference in the presence of missing data.

本文引用的文献

1
A Polygenic Risk Score to Predict Future Adult Short Stature Among Children.一种预测儿童未来成年矮身材的多基因风险评分。
J Clin Endocrinol Metab. 2021 Jun 16;106(7):1918-1928. doi: 10.1210/clinem/dgab215.
2
Improved prediction of fracture risk leveraging a genome-wide polygenic risk score.利用全基因组多基因风险评分提高骨折风险预测能力。
Genome Med. 2021 Feb 3;13(1):16. doi: 10.1186/s13073-021-00838-6.
3
Individuals with common diseases but with a low polygenic risk score could be prioritized for rare variant screening.具有常见疾病但低多基因风险评分的个体可优先进行罕见变异筛查。
存在缺失数据时的多组学调控网络推断。
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad309.
4
Genetic determinants of polygenic prediction accuracy within a population.群体内多基因预测准确性的遗传决定因素。
Genetics. 2022 Nov 30;222(4). doi: 10.1093/genetics/iyac158.
Genet Med. 2021 Mar;23(3):508-515. doi: 10.1038/s41436-020-01007-7. Epub 2020 Oct 28.
4
A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank.一种快速且可扩展的大规模超高维稀疏回归框架及其在 UK Biobank 中的应用。
PLoS Genet. 2020 Oct 23;16(10):e1009141. doi: 10.1371/journal.pgen.1009141. eCollection 2020 Oct.
5
A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank.一种用于全基因组事件时间数据分析的快速而准确的方法及其在 UK Biobank 中的应用。
Am J Hum Genet. 2020 Aug 6;107(2):222-233. doi: 10.1016/j.ajhg.2020.06.003. Epub 2020 Jun 25.
6
Polygenic risk for coronary heart disease acts through atherosclerosis in type 2 diabetes.多基因冠心病风险通过 2 型糖尿病的动脉粥样硬化起作用。
Cardiovasc Diabetol. 2020 Jan 30;19(1):12. doi: 10.1186/s12933-020-0988-9.
7
A resource-efficient tool for mixed model association analysis of large-scale data.一种资源高效的工具,用于大规模数据的混合模型关联分析。
Nat Genet. 2019 Dec;51(12):1749-1755. doi: 10.1038/s41588-019-0530-8. Epub 2019 Nov 25.
8
A meta-analysis of genome-wide association studies identifies multiple longevity genes.一项全基因组关联研究的荟萃分析确定了多个长寿基因。
Nat Commun. 2019 Aug 14;10(1):3669. doi: 10.1038/s41467-019-11558-2.
9
MEBoost: Variable selection in the presence of measurement error.MEBoost:存在测量误差时的变量选择。
Stat Med. 2019 Jul 10;38(15):2705-2718. doi: 10.1002/sim.8130. Epub 2019 Mar 11.
10
Genomics of 1 million parent lifespans implicates novel pathways and common diseases and distinguishes survival chances.100 万对父母寿命的基因组学研究提示了新的途径和常见疾病,并区分了生存机会。
Elife. 2019 Jan 15;8:e39856. doi: 10.7554/eLife.39856.