用于多基因评分的高效分组套索回归及其在“我们所有人”项目和英国生物银行中的应用

Efficient blockLASSO for polygenic scores with applications to all of us and UK Biobank.

作者信息

Raben Timothy G, Lello Louis, Widen Erik, Hsu Stephen D H

机构信息

Department of Physics and Astronomy, Michigan State University, East Lansing, USA.

Genomic Prediction, Inc., North Brunswick, NJ, USA.

出版信息

BMC Genomics. 2025 Mar 27;26(1):302. doi: 10.1186/s12864-025-11505-0.

DOI:10.1186/s12864-025-11505-0

PMID:40148775

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11948729/

Abstract

We develop a "block" LASSO (blockLASSO) approach for training polygenic scores (PGS) and demonstrate its use in All of Us (AoU) and the UK Biobank (UKB). blockLASSO utilizes the approximate block diagonal structure (due to chromosomal partition of the genome) of linkage disequilibrium (LD). The new implementation can be used for exploratory and methods research where repeated PGS training is necessary and expensive. For 11 different phenotypes, in two different biobanks, and across 5 different ancestry groups (African, American, East Asian, European, and South Asian) - we demonstrate that blockLASSO is generally as effective for training PGS as a (global) LASSO. Previous work has shown penalized regression methods produce competitive PGS to alternative approaches. It has been shown that some phenotypes are more/less polygenic than others. Using sparse algorithms, an accurate PGS can be trained for type 1 diabetes (T1D) using single nucleotide variants (SNVs), but a PGS for body mass index (BMI) would need more than 10k SNVs. blockLASSO produces similar PGS for phenotypes while training with just a fraction of the variants per block. Within AoU (using only genetic information) block PGS for T1D reaches an AUC of and for BMI a correlation of , whereas a global LASSO approach which finds for T1D an AUC and BMI a correlation . This new block approach is more computationally efficient and scalable than naive global machine learning approaches and makes it ideal for exploratory methods investigations based on penalized regression.

摘要

我们开发了一种用于训练多基因分数（PGS）的“块”套索法（blockLASSO），并展示了其在“我们所有人”（AoU）和英国生物银行（UKB）中的应用。blockLASSO利用了连锁不平衡（LD）的近似块对角结构（由于基因组的染色体划分）。这种新方法可用于需要重复进行PGS训练且成本高昂的探索性研究和方法研究。对于11种不同的表型、两个不同的生物银行以及5个不同的祖先群体（非洲、美洲、东亚、欧洲和南亚），我们证明了blockLASSO在训练PGS方面通常与（全局）套索法一样有效。先前的工作表明，惩罚回归方法能产生与其他方法相竞争的PGS。研究表明，某些表型的多基因性比其他表型更强或更弱。使用稀疏算法，仅使用单核苷酸变异（SNV）就能为1型糖尿病（T1D）训练出准确的PGS，但身体质量指数（BMI）的PGS则需要超过10,000个SNV。blockLASSO在每个块仅使用一小部分变异进行训练时，就能为各种表型产生相似的PGS。在AoU中（仅使用遗传信息），T1D的块PGS的曲线下面积（AUC）达到了[具体数值未给出]，BMI的相关性达到了[具体数值未给出]，而全局套索法得到的T1D的AUC为[具体数值未给出]，BMI的相关性为[具体数值未给出]。这种新的块方法比简单的全局机器学习方法在计算上更高效且更具扩展性，使其成为基于惩罚回归的探索性方法研究的理想选择。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于多基因评分的高效分组套索回归及其在“我们所有人”项目和英国生物银行中的应用

Efficient blockLASSO for polygenic scores with applications to all of us and UK Biobank.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

用于多基因评分的高效分组套索回归及其在“我们所有人”项目和英国生物银行中的应用

Efficient blockLASSO for polygenic scores with applications to all of us and UK Biobank.

作者信息

机构信息

出版信息

相似文献

本文引用的文献