Suppr超能文献

用于生物库规模数据的 SNP 遗传力可扩展估计器。

A scalable estimator of SNP heritability for biobank-scale data.

机构信息

Department of Computer Science, University of California, Los Angeles, CA, USA.

Department of Human Genetics, University of California, Los Angeles, CA, USA.

出版信息

Bioinformatics. 2018 Jul 1;34(13):i187-i194. doi: 10.1093/bioinformatics/bty253.

Abstract

MOTIVATION

Heritability, the proportion of variation in a trait that can be explained by genetic variation, is an important parameter in efforts to understand the genetic architecture of complex phenotypes as well as in the design and interpretation of genome-wide association studies. Attempts to understand the heritability of complex phenotypes attributable to genome-wide single nucleotide polymorphism (SNP) variation data has motivated the analysis of large datasets as well as the development of sophisticated tools to estimate heritability in these datasets. Linear mixed models (LMMs) have emerged as a key tool for heritability estimation where the parameters of the LMMs, i.e. the variance components, are related to the heritability attributable to the SNPs analyzed. Likelihood-based inference in LMMs, however, poses serious computational burdens.

RESULTS

We propose a scalable randomized algorithm for estimating variance components in LMMs. Our method is based on a method-of-moment estimator that has a runtime complexity O(NMB) for N individuals and M SNPs (where B is a parameter that controls the number of random matrix-vector multiplications). Further, by leveraging the structure of the genotype matrix, we can reduce the time complexity to O(NMBmax( log⁡3N, log⁡3M)). We demonstrate the scalability and accuracy of our method on simulated as well as on empirical data. On standard hardware, our method computes heritability on a dataset of 500 000 individuals and 100 000 SNPs in 38 min.

AVAILABILITY AND IMPLEMENTATION

The RHE-reg software is made freely available to the research community at: https://github.com/sriramlab/RHE-reg.

摘要

动机

遗传力是指可以用遗传变异解释的特征变异比例,对于理解复杂表型的遗传结构以及设计和解释全基因组关联研究,它是一个重要的参数。尝试理解归因于全基因组单核苷酸多态性 (SNP) 变异数据的复杂表型的遗传力,促使人们分析大型数据集,并开发复杂的工具来估计这些数据集中的遗传力。线性混合模型 (LMM) 已成为遗传力估计的关键工具,其中 LMM 的参数,即方差分量,与归因于分析的 SNPs 的遗传力有关。然而,LMM 中的似然推理会带来严重的计算负担。

结果

我们提出了一种用于估计 LMM 中方差分量的可扩展随机化算法。我们的方法基于矩估计量,对于 N 个个体和 M 个 SNP(其中 B 是控制随机矩阵-向量乘法次数的参数),其运行时复杂度为 O(NMB)。此外,通过利用基因型矩阵的结构,我们可以将时间复杂度降低到 O(NMBmax(log⁡3N,log⁡3M))。我们在模拟和真实数据上验证了我们方法的可扩展性和准确性。在标准硬件上,我们的方法可以在 38 分钟内计算出 500000 个个体和 100000 个 SNP 的数据集的遗传力。

可用性和实现

RHE-reg 软件可在 https://github.com/sriramlab/RHE-reg 上免费提供给研究社区。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d60/6022682/57911f9e1c2e/bty253f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验