Department of Biostatistics, Columbia University, New York, NY, USA.
Division of Nephrology, Department of Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, USA.
Nat Commun. 2024 Jul 31;15(1):6460. doi: 10.1038/s41467-024-50726-x.
Genome-wide association studies (GWAS) for biomarkers important for clinical phenotypes can lead to clinically relevant discoveries. Conventional GWAS for quantitative traits are based on simplified regression models modeling the conditional mean of a phenotype as a linear function of genotype. We draw attention here to an alternative, lesser known approach, namely quantile regression that naturally extends linear regression to the analysis of the entire conditional distribution of a phenotype of interest. Quantile regression can be applied efficiently at biobank scale, while having some unique advantages such as (1) identifying variants with heterogeneous effects across quantiles of the phenotype distribution; (2) accommodating a wide range of phenotype distributions including non-normal distributions, with invariance of results to trait transformations; and (3) providing more detailed information about genotype-phenotype associations even for those associations identified by conventional GWAS. We show in simulations that quantile regression is powerful across both homogeneous and various heterogeneous models. Applications to 39 quantitative traits in the UK Biobank demonstrate that quantile regression can be a helpful complement to linear regression in GWAS and can identify variants with larger effects on high-risk subgroups of individuals but with lower or no contribution overall.
全基因组关联研究(GWAS)对于重要的临床表型生物标志物可以带来有临床意义的发现。传统的用于定量性状的 GWAS 基于简化的回归模型,将表型的条件均值建模为基因型的线性函数。我们在这里提请注意另一种不太为人知的方法,即分位数回归,它自然地将线性回归扩展到对感兴趣的表型的整个条件分布的分析。分位数回归可以在生物库规模上高效应用,同时具有一些独特的优势,例如(1)在表型分布的分位数上识别具有异质效应的变体;(2)适应包括非正态分布在内的广泛的表型分布,结果对特征转换不变;(3)即使对于那些通过传统 GWAS 识别的关联,也能提供关于基因型-表型关联的更详细信息。我们在模拟中表明,分位数回归在同质和各种异质模型中都非常强大。在英国生物库中的 39 个定量性状中的应用表明,分位数回归可以成为 GWAS 中线性回归的有益补充,并且可以识别对个体高风险亚组有更大影响的变体,但总体贡献较低或没有。