Suppr超能文献

计算效率高的全基因组回归分析用于定量和二项性状。

Computationally efficient whole-genome regression for quantitative and binary traits.

机构信息

Regeneron Genetics Center, Tarrytown, NY, USA.

出版信息

Nat Genet. 2021 Jul;53(7):1097-1103. doi: 10.1038/s41588-021-00870-7. Epub 2021 May 20.

Abstract

Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case-control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals.

摘要

对包含数千种表型的队列进行全基因组关联分析在计算上是昂贵的,特别是在考虑样本亲缘关系或群体结构时。在这里,我们提出了一种新的机器学习方法,称为 REGINIE,用于拟合全基因组回归模型,用于定量和二项表型,在多性状分析中比替代方法快得多,同时保持统计效率。该方法自然适应于多个表型的并行分析,并且只需要将基因型矩阵的局部段加载到内存中,与现有替代方法不同,现有替代方法必须将全基因组矩阵加载到内存中。这导致计算时间和内存使用量的大幅节省。我们引入了一种用于不平衡病例对照表型的快速近似 Firth 逻辑回归检验。该方法非常适合利用分布式计算框架。我们使用多达 407,746 个人的 UK Biobank 数据集展示了这种方法的准确性和计算优势。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验