Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
Department of Psychiatry, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.
Genet Epidemiol. 2024 Oct;48(7):310-323. doi: 10.1002/gepi.22578. Epub 2024 Jun 28.
In most Proteome-Wide Association Studies (PWAS), variants near the protein-coding gene (±1 Mb), also known as cis single nucleotide polymorphisms (SNPs), are used to predict protein levels, which are then tested for association with phenotypes. However, proteins can be regulated through variants outside of the cis region. An intermediate GWAS step to identify protein quantitative trait loci (pQTL) allows for the inclusion of trans SNPs outside the cis region in protein-level prediction models. Here, we assess the prediction of 540 proteins in 1002 individuals from the Women's Health Initiative (WHI), split equally into a GWAS set, an elastic net training set, and a testing set. We compared the testing r between measured and predicted protein levels using this proposed approach, to the testing r using only cis SNPs. The two methods usually resulted in similar testing r, but some proteins showed a significant increase in testing r with our method. For example, for cartilage acidic protein 1, the testing r increased from 0.101 to 0.351. We also demonstrate reproducible findings for predicted protein association with lipid and blood cell traits in WHI participants without proteomics data and in UK Biobank utilizing our PWAS weights.
在大多数蛋白质组关联研究(PWAS)中,变体靠近蛋白质编码基因(±1 Mb),也称为顺式单核苷酸多态性(SNP),用于预测蛋白质水平,然后测试与表型的关联。然而,蛋白质可以通过顺式区域之外的变体进行调节。中间的 GWAS 步骤,即鉴定蛋白质数量性状基因座(pQTL),允许将顺式区域之外的跨 SNP 纳入蛋白质水平预测模型中。在这里,我们评估了来自妇女健康倡议(WHI)的 1002 个人中的 540 种蛋白质的预测,这些个体被平均分为 GWAS 集、弹性网训练集和测试集。我们使用这种拟议的方法比较了使用测量和预测蛋白质水平的测试 r,与仅使用顺式 SNP 的测试 r。这两种方法通常会产生相似的测试 r,但有些蛋白质的测试 r 会随着我们的方法显著增加。例如,对于软骨酸性蛋白 1,测试 r 从 0.101 增加到 0.351。我们还展示了在没有蛋白质组学数据的 WHI 参与者中和在利用我们的 PWAS 权重的英国生物库中,预测蛋白质与脂质和血细胞特征的关联的可重复发现。