Irigoien Itziar, Mas-Bermejo Patricia, Papiol Sergi, Barrantes-Vidal Neus, Rosa Araceli, Arenas Concepción
Department of Computation and Artificial Intelligence, Euskal Herriko Unibertsitatea (UPV/EHU), Donostia, Spain.
Zoology and Biological Anthropology Section of the Evolutionary Biology, Ecology and Environmental Sciences Department, Universitat de Barcelona (UB), Barcelona, Spain.
Front Psychiatry. 2025 Aug 21;16:1621972. doi: 10.3389/fpsyt.2025.1621972. eCollection 2025.
Most methodological Polygenic Risk Score (PRS)-related papers explain the laborious process of computing the PRS in great depth. Afterwards, as a last step, it is generally described that to test a possible association between a PRS and a trait of interest, an analysis through regression models (linear or logistic, depending on data type) should be carried out adjusting for covariates (e.g., sex, age, clinical information, or genetic ancestry-based Principal Components). When covariates are included, measurements such as the increment on the variance explained by the addition of the PRS to the model or the significance of the PRS term are usually reported. However, the association study between PRSs and a trait is a complex concern that requires proper modeling and analysis, since interactions and validation conditions represent crucial aspects. Even though excellent papers explain how to use and interpret the results obtained with such regression models, sometimes important information from the previously calculated PRS may be lost, partly due to the automation of analyses. With this guide, we intend to fill a gap in association studies between PRSs and a trait and to facilitate the analysis, obtaining statistically correct results. It contains a motivating real data case analyzed exhaustively to illustrate how to face a real analysis. Besides, it is accompanied by four examples, called , which present different situations the researcher may encounter along with the R code for analyzing all these data sets and the corresponding application of the steps in this guide.
大多数与多基因风险评分(PRS)方法相关的论文都深入解释了计算PRS的繁琐过程。之后,作为最后一步,通常会描述为了检验PRS与感兴趣的性状之间可能存在的关联,应通过回归模型(线性或逻辑回归,取决于数据类型)进行分析,并对协变量(例如性别、年龄、临床信息或基于遗传血统的主成分)进行调整。当纳入协变量时,通常会报告一些测量指标,例如将PRS添加到模型后所解释的方差增量或PRS项的显著性。然而,PRS与性状之间的关联研究是一个复杂的问题,需要适当的建模和分析,因为相互作用和验证条件是关键方面。尽管优秀的论文解释了如何使用和解释通过此类回归模型获得的结果,但有时由于分析的自动化,先前计算的PRS中的重要信息可能会丢失。通过本指南,我们旨在填补PRS与性状之间关联研究的空白,并促进分析,以获得统计上正确的结果。它包含一个经过详尽分析的真实数据案例,以说明如何面对实际分析。此外,它还附有四个示例,称为 ,展示了研究人员可能遇到的不同情况,以及用于分析所有这些数据集的R代码和本指南中步骤的相应应用。