一种用于高维数量性状基因座发现的计算高效的贝叶斯看似不相关回归模型。

A computationally efficient Bayesian seemingly unrelated regressions model for high-dimensional quantitative trait loci discovery.

作者信息

Bottolo Leonardo, Banterle Marco, Richardson Sylvia, Ala-Korpela Mika, Järvelin Marjo-Riitta, Lewin Alex

机构信息

Department of Medical Genetics, University of Cambridge, Cambridge, UK.

The Alan Turing Institute, London, UK.

出版信息

J R Stat Soc Ser C Appl Stat. 2021 Aug;70(4):886-908. doi: 10.1111/rssc.12490. Epub 2021 May 8.

DOI:10.1111/rssc.12490

PMID:35001978

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7612194/

Abstract

Our work is motivated by the search for metabolite quantitative trait loci (QTL) in a cohort of more than 5000 people. There are 158 metabolites measured by NMR spectroscopy in the 31-year follow-up of the Northern Finland Birth Cohort 1966 (NFBC66). These metabolites, as with many multivariate phenotypes produced by high-throughput biomarker technology, exhibit strong correlation structures. Existing approaches for combining such data with genetic variants for multivariate QTL analysis generally ignore phenotypic correlations or make restrictive assumptions about the associations between phenotypes and genetic loci. We present a computationally efficient Bayesian seemingly unrelated regressions model for high-dimensional data, with cell-sparse variable selection and sparse graphical structure for covariance selection. Cell sparsity allows different phenotype responses to be associated with different genetic predictors and the graphical structure is used to represent the conditional dependencies between phenotype variables. To achieve feasible computation of the large model space, we exploit a factorisation of the covariance matrix. Applying the model to the NFBC66 data with 9000 directly genotyped single nucleotide polymorphisms, we are able to simultaneously estimate genotype-phenotype associations and the residual dependence structure among the metabolites. The R package BayesSUR with full documentation is available at https://cran.r-project.org/web/packages/BayesSUR/.

摘要

我们的工作旨在对5000多人的队列进行代谢物数量性状基因座（QTL）研究。在对1966年芬兰北部出生队列（NFBC66）进行的31年随访中，通过核磁共振波谱法测定了158种代谢物。这些代谢物与高通量生物标志物技术产生的许多多变量表型一样，呈现出很强的相关结构。现有的将此类数据与遗传变异相结合进行多变量QTL分析的方法通常会忽略表型相关性，或者对表型与基因座之间的关联做出限制性假设。我们提出了一种用于高维数据的计算高效的贝叶斯看似不相关回归模型，该模型具有细胞稀疏变量选择和用于协方差选择的稀疏图形结构。细胞稀疏性允许不同的表型反应与不同的遗传预测因子相关联，并且图形结构用于表示表型变量之间的条件依赖性。为了实现对大型模型空间的可行计算，我们利用了协方差矩阵的分解。将该模型应用于具有9000个直接基因分型单核苷酸多态性的NFBC66数据，我们能够同时估计基因型与表型的关联以及代谢物之间的残余依赖性结构。带有完整文档的R包BayesSUR可在https://cran.r-project.org/web/packages/BayesSUR/获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一种用于高维数量性状基因座发现的计算高效的贝叶斯看似不相关回归模型。

A computationally efficient Bayesian seemingly unrelated regressions model for high-dimensional quantitative trait loci discovery.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

一种用于高维数量性状基因座发现的计算高效的贝叶斯看似不相关回归模型。

A computationally efficient Bayesian seemingly unrelated regressions model for high-dimensional quantitative trait loci discovery.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献