Regenie.QRS:生物样本库规模下计算效率高的全基因组分位数回归

Regenie.QRS: computationally efficient whole-genome quantile regression at biobank scale.

作者信息

Wang Fan, Wang Chen, Wang Tianying, Masala Marco, Fiorillo Edoardo, Devoto Marcella, Cucca Francesco, Ionita-Laza Iuliana

机构信息

Department of Biostatistics, Columbia University, New York, US.

Department of Statistics, Colorado State University, Fort Collins, US.

出版信息

bioRxiv. 2025 May 7:2025.05.02.651730. doi: 10.1101/2025.05.02.651730.

Abstract

Genotype-phenotype associations can be context-dependent and dynamic in nature leading to heterogeneity of genetic effects across different parts of the phenotype distribution. Quantile regression, an alternative to linear regression for continuous phenotypes, is particularly well suited for detecting and characterizing heterogeneous genotype-phenotype associations. Here we propose a novel and computationally efficient whole-genome quantile regression technique, Regenie.QRS, for biobank-scale GWAS data with genetic structure. Our approach first estimates the polygenic effect, and then incorporates this effect as an offset in the non-mixed quantile regression model. Our simulations demonstrate robust control of type I error and higher power to detect heterogeneous associations relative to linear regression in GWAS, and improved power over the marginal quantile regression tests. We present applications using data from the UK Biobank and the ProgeNIA/SardiNIA project, where we show the advantages of Regenie.QRS in identifying and characterizing heterogeneous genetic effects. To cite just one interesting example, using quantile regression we are able to show that even though variants at the locus increase glucose levels, their effects are much stronger at lower quantiles of glucose level distribution than at higher quantiles, showing that serves as a guardian against low glucose levels without driving dangerous hyperglycemia, which may explain the lack of association with diabetes risk.

摘要

基因型-表型关联可能具有背景依赖性且本质上是动态的,这导致在表型分布的不同部分遗传效应存在异质性。分位数回归是连续表型线性回归的替代方法,特别适合检测和表征异质的基因型-表型关联。在此,我们提出一种新颖且计算高效的全基因组分位数回归技术Regenie.QRS,用于处理具有遗传结构的生物样本库规模的全基因组关联研究(GWAS)数据。我们的方法首先估计多基因效应,然后将此效应作为偏移量纳入非混合分位数回归模型。我们的模拟表明,相对于GWAS中的线性回归,该方法能稳健控制I型错误,且具有更高的检测异质关联的能力,并且比边际分位数回归检验具有更高的效能。我们展示了使用来自英国生物样本库和ProgeNIA/SardiNIA项目数据的应用,其中我们展示了Regenie.QRS在识别和表征异质遗传效应方面的优势。仅举一个有趣的例子,使用分位数回归我们能够表明,即使位于 位点的变异会升高血糖水平,但其在血糖水平分布的较低分位数处的效应比在较高分位数处要强得多,这表明 可作为防止低血糖水平的保护因素,而不会引发危险的高血糖症,这可能解释了其与糖尿病风险缺乏关联的原因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bbfa/12247936/1a682abb6b4b/nihpp-2025.05.02.651730v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索