Bermann Matias, Álvarez Múnera Alejandra, Misztal Ignacy, Lourenco Daniela
Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA.
G3 (Bethesda). 2025 Aug 6;15(8). doi: 10.1093/g3journal/jkaf136.
Validation of genomic predictions or polygenic risk scores is key for model selection and evaluating the performance of the chosen prediction machinery. Non-parametric validation, such as cross-validation, is popular but does not account for population structure and the fact that the interest could be in validating a set of individuals and not the entire population. Semi-parametric methods, such as the LR method, also use removed records to validate predictions, account for population structure, and allow focus on a specific set of individuals of interest. Confidence intervals are obtained using semi-parametric methods without the need for repeated cross-validation. We developed a tool within the Blupf90 software suite, called validationf90, that allows researchers to conduct semi-parametric validation from the solutions obtained from that software suite. validationf90 calculates different validation statistics and their confidence intervals for a pre-defined set of individuals of interest, reflecting the bias and accuracy of genomic predictions. The program allows for genomic predictions obtained from frequentist and Bayesian methods, as well as for categorical data. validationf90 can validate any model supported by the Blupf90 software suite and can be used with animal, plant, and human datasets. Predictions obtained with other software can be provided to validationf90 as long as the input format matches with the Blupf90 format.
基因组预测或多基因风险评分的验证对于模型选择和评估所选预测机制的性能至关重要。非参数验证,如交叉验证,很受欢迎,但没有考虑群体结构以及感兴趣的可能是验证一组个体而不是整个人口这一事实。半参数方法,如LR方法,也使用剔除的记录来验证预测,考虑群体结构,并允许关注特定的一组感兴趣的个体。使用半参数方法无需重复交叉验证即可获得置信区间。我们在Blupf90软件套件中开发了一个名为validationf90的工具,它允许研究人员根据从该软件套件获得的解决方案进行半参数验证。validationf90为预定义的一组感兴趣的个体计算不同的验证统计量及其置信区间,反映基因组预测的偏差和准确性。该程序允许使用频率论和贝叶斯方法获得的基因组预测,以及分类数据。validationf90可以验证Blupf90软件套件支持的任何模型,并且可以与动物、植物和人类数据集一起使用。只要输入格式与Blupf90格式匹配,用其他软件获得的预测也可以提供给validationf90。