School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA.
Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA.
Hum Mol Genet. 2023 Aug 26;32(17):2693-2703. doi: 10.1093/hmg/ddad097.
Recently, a non-parametric method has been proposed to impute the genetic component of a trait for a large set of genotyped individuals based on a separate genome-wide association study (GWAS) summary dataset of the same trait (from the same population). The imputed trait may contain linear, non-linear and epistatic effects of genetic variants, thus can be used for downstream linear or non-linear association analyses and machine learning tasks. Here, we propose an extension of the method to impute both genetic and environmental components of a trait using both single nucleotide polymorphism (SNP)-trait and omics-trait association summary data. We illustrate an application to a UK Biobank subset of individuals (n ≈ 80K) with both body mass index (BMI) GWAS data and metabolomic data. We divided the whole dataset into two equally sized and non-overlapping training and test datasets; we used the training data to build SNP- and metabolite-BMI association summary data and impute BMI on the test data. We compared the performance of the original and new imputation methods. As by the original method, the imputed BMI values by the new method largely retained SNP-BMI association information; however, the latter retained more information about BMI-environment associations and were more highly correlated with the original observed BMI values.
最近,提出了一种非参数方法,用于根据同一性状的独立全基因组关联研究(GWAS)汇总数据集,对大量基因分型个体的性状遗传成分进行推断(来自同一人群)。推断的性状可能包含遗传变异的线性、非线性和上位性效应,因此可用于下游的线性或非线性关联分析和机器学习任务。在这里,我们提出了一种扩展方法,该方法使用单核苷酸多态性(SNP)-性状和组学-性状关联汇总数据来推断性状的遗传和环境成分。我们举例说明了对具有体重指数(BMI)GWAS 数据和代谢组学数据的英国生物库亚组个体(n≈80K)的应用。我们将整个数据集分为两个大小相等且不重叠的训练和测试数据集;我们使用训练数据来构建 SNP 和代谢物-BMI 关联汇总数据,并在测试数据上推断 BMI。我们比较了原始和新的推断方法的性能。与原始方法一样,新方法推断的 BMI 值在很大程度上保留了 SNP-BMI 关联信息;然而,后者保留了更多关于 BMI-环境关联的信息,并且与原始观察到的 BMI 值相关性更高。