Suppr超能文献

混合模型的预测残差平方和:在基因组预测中的应用

Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction.

作者信息

Xu Shizhong

机构信息

Department of Botany and Plant Sciences, University of California, Riverside, California 92521

出版信息

G3 (Bethesda). 2017 Mar 10;7(3):895-909. doi: 10.1534/g3.116.038059.

Abstract

Genomic prediction is a statistical method to predict phenotypes of polygenic traits using high-throughput genomic data. Most diseases and behaviors in humans and animals are polygenic traits. The majority of agronomic traits in crops are also polygenic. Accurate prediction of these traits can help medical professionals diagnose acute diseases and breeders to increase food products, and therefore significantly contribute to human health and global food security. The best linear unbiased prediction (BLUP) is an important tool to analyze high-throughput genomic data for prediction. However, to judge the efficacy of the BLUP model with a particular set of predictors for a given trait, one has to provide an unbiased mechanism to evaluate the predictability. Cross-validation (CV) is an essential tool to achieve this goal, where a sample is partitioned into parts of roughly equal size, one part is predicted using parameters estimated from the remaining - 1 parts, and eventually every part is predicted using a sample excluding that part. Such a CV is called the K-fold CV. Unfortunately, CV presents a substantial increase in computational burden. We developed an alternative method, the HAT method, to replace CV. The new method corrects the estimated residual errors from the whole sample analysis using the leverage values of a hat matrix of the random effects to achieve the predicted residual errors. Properties of the HAT method were investigated using seven agronomic and 1000 metabolomic traits of an inbred rice population. Results showed that the HAT method is a very good approximation of the CV method. The method was also applied to 10 traits in 1495 hybrid rice with 1.6 million SNPs, and to human height of 6161 subjects with roughly 0.5 million SNPs of the Framingham heart study data. Predictabilities of the HAT and CV methods were all similar. The HAT method allows us to easily evaluate the predictabilities of genomic prediction for large numbers of traits in very large populations.

摘要

基因组预测是一种利用高通量基因组数据预测多基因性状表型的统计方法。人类和动物的大多数疾病和行为都是多基因性状。作物的大多数农艺性状也是多基因的。准确预测这些性状有助于医学专业人员诊断急性疾病,帮助育种者提高粮食产量,从而对人类健康和全球粮食安全做出重大贡献。最佳线性无偏预测(BLUP)是分析高通量基因组数据进行预测的重要工具。然而,要判断针对给定性状的特定预测变量集的BLUP模型的有效性,必须提供一种无偏机制来评估可预测性。交叉验证(CV)是实现这一目标的重要工具,即将一个样本分成大致相等大小的 份,用其余 - 1份估计的参数预测其中一份,最终用排除该部分的样本预测每一部分。这种CV称为K折交叉验证。不幸的是,交叉验证会使计算负担大幅增加。我们开发了一种替代方法,即HAT方法,来取代交叉验证。该新方法利用随机效应的帽子矩阵的杠杆值校正全样本分析估计的残差,以获得预测残差。利用一个自交水稻群体的7个农艺性状和1000个代谢组学性状研究了HAT方法的特性。结果表明,HAT方法是交叉验证方法的一个很好的近似。该方法还应用于1495份具有160万个单核苷酸多态性(SNP)的杂交水稻的10个性状,以及弗雷明汉心脏研究数据中约50万个SNP的6161名受试者的身高。HAT方法和交叉验证方法的可预测性都相似。HAT方法使我们能够轻松评估非常大群体中大量性状的基因组预测的可预测性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20e2/5345720/28e2714ef357/895f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验