截断数据下基因组预测中验证统计量的置信区间。

Confidence intervals for validation statistics with data truncation in genomic prediction.

机构信息

Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA.

Council on Dairy Cattle Breeding (CDCB), Bowie, MD, 20716, USA.

出版信息

Genet Sel Evol. 2024 Mar 8;56(1):18. doi: 10.1186/s12711-024-00883-w.

DOI:10.1186/s12711-024-00883-w

PMID:38459504

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11234739/

Abstract

BACKGROUND

Validation by data truncation is a common practice in genetic evaluations because of the interest in predicting the genetic merit of a set of young selection candidates. Two of the most used validation methods in genetic evaluations use a single data partition: predictivity or predictive ability (correlation between pre-adjusted phenotypes and estimated breeding values (EBV) divided by the square root of the heritability) and the linear regression (LR) method (comparison of "early" and "late" EBV). Both methods compare predictions with the whole dataset and a partial dataset that is obtained by removing the information related to a set of validation individuals. EBV obtained with the partial dataset are compared against adjusted phenotypes for the predictivity or EBV obtained with the whole dataset in the LR method. Confidence intervals for predictivity and the LR method can be obtained by replicating the validation for different samples (or folds), or bootstrapping. Analytical confidence intervals would be beneficial to avoid running several validations and to test the quality of the bootstrap intervals. However, analytical confidence intervals are unavailable for predictivity and the LR method.

RESULTS

We derived standard errors and Wald confidence intervals for the predictivity and statistics included in the LR method (bias, dispersion, ratio of accuracies, and reliability). The confidence intervals for the bias, dispersion, and reliability depend on the relationships and prediction error variances and covariances across the individuals in the validation set. We developed approximations for large datasets that only need the reliabilities of the individuals in the validation set. The confidence intervals for the ratio of accuracies and predictivity were obtained through the Fisher transformation. We show the adequacy of both the analytical and approximated analytical confidence intervals and compare them versus bootstrap confidence intervals using two simulated examples. The analytical confidence intervals were closer to the simulated ones for both examples. Bootstrap confidence intervals tend to be narrower than the simulated ones. The approximated analytical confidence intervals were similar to those obtained by bootstrapping.

CONCLUSIONS

Estimating the sampling variation of predictivity and the statistics in the LR method without replication or bootstrap is possible for any dataset with the formulas presented in this study.

摘要

背景

由于人们对预测一组年轻的候选个体的遗传优势感兴趣，因此在遗传评估中，数据截断验证是一种常见的做法。遗传评估中最常用的两种验证方法都使用单一的数据分区：预测性或预测能力（预调整表型与估计育种值（EBV）之间的相关性除以遗传力的平方根）和线性回归（LR）方法（“早期”和“晚期” EBV 的比较）。这两种方法都将预测值与整个数据集以及通过删除与一组验证个体相关的信息获得的部分数据集进行比较。在预测性方法中，使用部分数据集获得的 EBV 与调整后的表型进行比较；在 LR 方法中，使用整个数据集获得的 EBV 与调整后的表型进行比较。通过对不同样本（或折叠）进行重复验证或自举法，可以获得预测性和 LR 方法的置信区间。分析置信区间将有助于避免多次验证，并测试自举区间的质量。但是，无法为预测性和 LR 方法提供分析置信区间。

结果

我们为预测性和 LR 方法中包含的统计量（偏差、分散度、准确性比和可靠性）导出了标准误差和 Wald 置信区间。偏差、分散度和可靠性的置信区间取决于验证集中个体之间的关系和预测误差方差和协方差。我们为大型数据集开发了仅需要验证集中个体可靠性的近似值。准确性比和预测性的置信区间通过 Fisher 变换获得。我们通过两个模拟示例展示了分析和近似分析置信区间的充分性，并将其与自举置信区间进行了比较。对于这两个示例，分析置信区间都更接近模拟置信区间。自举置信区间往往比模拟置信区间窄。近似分析置信区间与自举获得的置信区间相似。

结论

对于本研究中提出的公式，对于任何数据集，无需重复或自举即可估计预测性和 LR 方法中的统计量的抽样变化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a181/11234739/b3ec535cceaf/12711_2024_883_Fig1_HTML.jpg

相似文献

Confidence intervals for validation statistics with data truncation in genomic prediction.截断数据下基因组预测中验证统计量的置信区间。

Genet Sel Evol. 2024 Mar 8;56(1):18. doi: 10.1186/s12711-024-00883-w.

Genomic predictions for yield traits in US Holsteins with unknown parent groups.美国荷斯坦牛未知父系群体的产量性状的基因组预测。

J Dairy Sci. 2021 May;104(5):5843-5853. doi: 10.3168/jds.2020-19789. Epub 2021 Mar 2.

Estimation of heritability with genomic information by method R.利用方法 R 从基因组信息估算遗传力。

J Anim Breed Genet. 2024 Sep;141(5):550-558. doi: 10.1111/jbg.12863. Epub 2024 Mar 25.

Boundaries for genotype, phenotype, and pedigree truncation in genomic evaluations in pigs.猪基因组评估中基因型、表型和系谱截断的界限。

J Anim Sci. 2023 Jan 3;101. doi: 10.1093/jas/skad273.

Validation of single-step GBLUP genomic predictions from threshold models using the linear regression method: An application in chicken mortality.线性回归法验证阈值模型下单步 GBLUP 基因组预测的有效性：以鸡死亡率为例。

J Anim Breed Genet. 2021 Jan;138(1):4-13. doi: 10.1111/jbg.12507. Epub 2020 Sep 28.

Behavior of the Linear Regression method to estimate bias and accuracies with correct and incorrect genetic evaluation models.线性回归方法对正确和不正确遗传评估模型估计偏差和精度的行为。

J Dairy Sci. 2020 Jan;103(1):529-544. doi: 10.3168/jds.2019-16603. Epub 2019 Nov 6.

Genomic prediction ability for beef fatty acid profile in Nelore cattle using different pseudo-phenotypes.使用不同伪表型对内洛尔牛牛肉脂肪酸谱的基因组预测能力。

J Appl Genet. 2018 Nov;59(4):493-501. doi: 10.1007/s13353-018-0470-5. Epub 2018 Sep 24.

Semi-parametric estimates of population accuracy and bias of predictions of breeding values and future phenotypes using the LR method.使用逻辑回归（LR）方法对半参数估计群体预测准确性和偏差的估计。

Genet Sel Evol. 2018 Nov 6;50(1):53. doi: 10.1186/s12711-018-0426-6.

Genomic prediction ability for feed efficiency traits using different models and pseudo-phenotypes under several validation strategies in Nelore cattle.应用不同模型和拟表型在几种验证策略下对尼洛拉牛饲料效率性状进行基因组预测能力。

Animal. 2021 Feb;15(2):100085. doi: 10.1016/j.animal.2020.100085. Epub 2020 Dec 24.

How pedigree errors affect genetic evaluations and validation statistics.系谱错误如何影响遗传评估和验证统计。

J Dairy Sci. 2024 Jun;107(6):3716-3723. doi: 10.3168/jds.2023-24070. Epub 2023 Dec 21.

引用本文的文献

Semi-parametric validation of genomic predictions and polygenic risk scores with the Blupf90 software suite.使用Blupf90软件套件对基因组预测和多基因风险评分进行半参数验证。

G3 (Bethesda). 2025 Aug 6;15(8). doi: 10.1093/g3journal/jkaf136.

Combining large broiler populations into a single genomic evaluation: dealing with genetic divergence.将大型肉鸡群体合并进行单一基因组评估：应对遗传差异。

J Anim Sci. 2024 Jan 3;102. doi: 10.1093/jas/skae360.

Comment on: "Hematological Toxicity of PARP Inhibitors in Metastatic Prostate Cancer Patients with Mutations of BRCA or HRR Genes: A Systematic Review and Safety Meta‑analysis".对《PARP抑制剂在携带BRCA或HRR基因突变的转移性前列腺癌患者中的血液学毒性：系统评价与安全性Meta分析》的评论

Target Oncol. 2024 Sep;19(5):811-812. doi: 10.1007/s11523-024-01082-9. Epub 2024 Jul 19.

本文引用的文献

Comparison of different validation methods for single-step genomic evaluations based on a simulated cattle population.基于模拟牛群的单步基因组评估不同验证方法的比较。

J Dairy Sci. 2023 Dec;106(12):9026-9043. doi: 10.3168/jds.2023-23575. Epub 2023 Aug 23.

Effect of subdivision of the Lacaune dairy sheep breed on the accuracy of genomic prediction.拉卡奴奶绵羊品种细分对基因组预测准确性的影响。

J Dairy Sci. 2023 Aug;106(8):5570-5581. doi: 10.3168/jds.2022-23114. Epub 2023 Jun 20.

Prediction ability of an alternative multi-trait genomic evaluation for residual feed intake.替代多性状基因组评估对剩余采食量的预测能力。

J Anim Breed Genet. 2023 Sep;140(5):508-518. doi: 10.1111/jbg.12775. Epub 2023 Apr 25.

Integrating a growth degree-days based reaction norm methodology and multi-trait modeling for genomic prediction in wheat.整合基于生长度日的反应规范方法和多性状建模用于小麦基因组预测

Front Plant Sci. 2022 Sep 2;13:939448. doi: 10.3389/fpls.2022.939448. eCollection 2022.

International single-step SNPBLUP beef cattle evaluations for Limousin weaning weight.国际单步 SNPBLUP 肉牛评估利木赞断奶体重。

Genet Sel Evol. 2022 Sep 4;54(1):57. doi: 10.1186/s12711-022-00748-0.

Accounting for population structure in genomic predictions of Eucalyptus globulus.考虑桉树基因组预测中的群体结构。

G3 (Bethesda). 2022 Aug 25;12(9). doi: 10.1093/g3journal/jkac180.

Microbiability and microbiome-wide association analyses of feed efficiency and performance traits in pigs.猪饲料效率和性能性状的微生物学和微生物组关联分析。

Genet Sel Evol. 2022 Apr 25;54(1):29. doi: 10.1186/s12711-022-00717-7.

Comparison of a single-step with a multistep single nucleotide polymorphism best linear unbiased predictor model for genomic evaluation of conformation traits in German Holsteins.比较一步法和多步单核苷酸多态性最佳线性无偏预测模型在德国荷斯坦牛 conformation 性状基因组评估中的应用。

J Dairy Sci. 2022 Apr;105(4):3306-3322. doi: 10.3168/jds.2021-21145. Epub 2022 Feb 16.

Removing data and using metafounders alleviates biases for all traits in Lacaune dairy sheep predictions.去除数据并使用元发现者可减轻 Lacaune 奶绵羊预测中所有特征的偏差。

J Dairy Sci. 2022 Mar;105(3):2439-2452. doi: 10.3168/jds.2021-20860. Epub 2022 Jan 13.

Single-step genomic evaluation of milk production traits in Canadian Alpine and Saanen dairy goats.加拿大阿尔卑斯山羊和莎能奶山羊产奶性状的单步基因组评估。

J Dairy Sci. 2022 Mar;105(3):2393-2407. doi: 10.3168/jds.2021-20558. Epub 2022 Jan 5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

截断数据下基因组预测中验证统计量的置信区间。

Confidence intervals for validation statistics with data truncation in genomic prediction.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献