• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GWAS 在你的笔记本上:用于全基因组关联研究的快速半并行线性和逻辑回归。

GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies.

机构信息

Department of Biostatistics, Erasmus MC, Rotterdam, The Netherlands.

出版信息

BMC Bioinformatics. 2013 May 28;14:166. doi: 10.1186/1471-2105-14-166.

DOI:10.1186/1471-2105-14-166
PMID:23711206
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3695771/
Abstract

BACKGROUND

Genome-wide association studies have become very popular in identifying genetic contributions to phenotypes. Millions of SNPs are being tested for their association with diseases and traits using linear or logistic regression models. This conceptually simple strategy encounters the following computational issues: a large number of tests and very large genotype files (many Gigabytes) which cannot be directly loaded into the software memory. One of the solutions applied on a grand scale is cluster computing involving large-scale resources. We show how to speed up the computations using matrix operations in pure R code.

RESULTS

We improve speed: computation time from 6 hours is reduced to 10-15 minutes. Our approach can handle essentially an unlimited amount of covariates efficiently, using projections. Data files in GWAS are vast and reading them into computer memory becomes an important issue. However, much improvement can be made if the data is structured beforehand in a way allowing for easy access to blocks of SNPs. We propose several solutions based on the R packages ff and ncdf.We adapted the semi-parallel computations for logistic regression. We show that in a typical GWAS setting, where SNP effects are very small, we do not lose any precision and our computations are few hundreds times faster than standard procedures.

CONCLUSIONS

We provide very fast algorithms for GWAS written in pure R code. We also show how to rearrange SNP data for fast access.

摘要

背景

全基因组关联研究已成为识别遗传对表型影响的热门方法。使用线性或逻辑回归模型,对数百万个 SNP 进行与疾病和特征的关联测试。这种概念上简单的策略遇到了以下计算问题:大量的测试和非常大的基因型文件(许多千兆字节),不能直接加载到软件内存中。一种大规模应用的解决方案是涉及大规模资源的集群计算。我们展示了如何使用纯 R 代码中的矩阵运算来加速计算。

结果

我们提高了速度:计算时间从 6 小时缩短到 10-15 分钟。我们的方法可以有效地处理大量的协变量,使用投影。GWAS 中的数据文件非常庞大,将其读入计算机内存成为一个重要问题。但是,如果数据事先以允许轻松访问 SNP 块的方式进行结构化,就可以进行很大的改进。我们基于 R 包 ff 和 ncdf 提出了几种解决方案。我们对逻辑回归进行了半并行计算的改编。我们表明,在 SNP 效应非常小的典型 GWAS 环境中,我们不会损失任何精度,并且我们的计算速度比标准程序快几百倍。

结论

我们提供了用纯 R 代码编写的非常快速的 GWAS 算法。我们还展示了如何重新排列 SNP 数据以快速访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3a0/3695771/2175f9451c0e/1471-2105-14-166-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3a0/3695771/843ce72adec5/1471-2105-14-166-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3a0/3695771/e9ed70e706c6/1471-2105-14-166-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3a0/3695771/7261a89f4b62/1471-2105-14-166-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3a0/3695771/ff003e946873/1471-2105-14-166-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3a0/3695771/2175f9451c0e/1471-2105-14-166-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3a0/3695771/843ce72adec5/1471-2105-14-166-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3a0/3695771/e9ed70e706c6/1471-2105-14-166-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3a0/3695771/7261a89f4b62/1471-2105-14-166-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3a0/3695771/ff003e946873/1471-2105-14-166-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3a0/3695771/2175f9451c0e/1471-2105-14-166-5.jpg

相似文献

1
GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies.GWAS 在你的笔记本上:用于全基因组关联研究的快速半并行线性和逻辑回归。
BMC Bioinformatics. 2013 May 28;14:166. doi: 10.1186/1471-2105-14-166.
2
GWAS with longitudinal phenotypes: performance of approximate procedures.具有纵向表型的全基因组关联研究:近似方法的性能
Eur J Hum Genet. 2015 Oct;23(10):1384-91. doi: 10.1038/ejhg.2015.1. Epub 2015 Feb 25.
3
SurvivalGWAS_SV: software for the analysis of genome-wide association studies of imputed genotypes with "time-to-event" outcomes.生存全基因组关联研究-结构变异:用于分析具有“事件发生时间”结局的插补基因型全基因组关联研究的软件。
BMC Bioinformatics. 2017 May 19;18(1):265. doi: 10.1186/s12859-017-1683-z.
4
High-throughput analysis of epistasis in genome-wide association studies with BiForce.利用 BiForce 进行全基因组关联研究中的上位性的高通量分析。
Bioinformatics. 2012 Aug 1;28(15):1957-64. doi: 10.1093/bioinformatics/bts304. Epub 2012 May 21.
5
GWIS--model-free, fast and exhaustive search for epistatic interactions in case-control GWAS.GWIS--无模型、快速且全面搜索病例对照 GWAS 中的上位相互作用。
BMC Genomics. 2013;14 Suppl 3(Suppl 3):S10. doi: 10.1186/1471-2164-14-S3-S10. Epub 2013 May 28.
6
Heterogeneous computing architecture for fast detection of SNP-SNP interactions.用于快速检测 SNP-SNP 相互作用的异构计算架构。
BMC Bioinformatics. 2014 Jun 25;15:216. doi: 10.1186/1471-2105-15-216.
7
Optimized homomorphic encryption solution for secure genome-wide association studies.优化的同态加密解决方案,用于安全的全基因组关联研究。
BMC Med Genomics. 2020 Jul 21;13(Suppl 7):83. doi: 10.1186/s12920-020-0719-9.
8
Fast computation for genome-wide association studies using boosted one-step statistics.基于提升一步统计量的全基因组关联研究的快速计算。
Bioinformatics. 2012 Jul 15;28(14):1818-22. doi: 10.1093/bioinformatics/bts291. Epub 2012 May 15.
9
ParallABEL: an R library for generalized parallelization of genome-wide association studies.ParallABEL:一个用于全基因组关联研究的广义并行化的 R 库。
BMC Bioinformatics. 2010 Apr 29;11:217. doi: 10.1186/1471-2105-11-217.
10
rMVP: A Memory-efficient, Visualization-enhanced, and Parallel-accelerated Tool for Genome-wide Association Study.rMVP:一种用于全基因组关联研究的内存高效、可视化增强和并行加速的工具。
Genomics Proteomics Bioinformatics. 2021 Aug;19(4):619-628. doi: 10.1016/j.gpb.2020.10.007. Epub 2021 Mar 2.

引用本文的文献

1
GWAS Procedures for Gene Mapping in Diverse Populations With Complex Structures.复杂结构多样化人群中基因定位的全基因组关联研究程序
Bio Protoc. 2025 Apr 20;15(8):e5284. doi: 10.21769/BioProtoc.5284.
2
Federated privacy-protected meta- and mega-omics data analysis in multi-center studies with a fully open-source analytic platform.在多中心研究中,使用完全开源的分析平台进行联合隐私保护的元组学和宏组学数据分析。
PLoS Comput Biol. 2024 Dec 9;20(12):e1012626. doi: 10.1371/journal.pcbi.1012626. eCollection 2024 Dec.
3
Fast multiple-trait genome-wide association analysis for correlated longitudinal measurements.

本文引用的文献

1
Fast linear mixed model computations for genome-wide association studies with longitudinal data.用于具有纵向数据的全基因组关联研究的快速线性混合模型计算。
Stat Med. 2013 Jan 15;32(1):165-80. doi: 10.1002/sim.5517. Epub 2012 Aug 22.
2
Matrix eQTL: ultra fast eQTL analysis via large matrix operations.矩阵 eQTL:通过大型矩阵运算实现超快速 eQTL 分析。
Bioinformatics. 2012 May 15;28(10):1353-8. doi: 10.1093/bioinformatics/bts163. Epub 2012 Apr 6.
3
FaST linear mixed models for genome-wide association studies.Fast 线性混合模型在全基因组关联研究中的应用。
快速多性状全基因组关联分析用于相关纵向测量。
Sci Rep. 2023 Nov 23;13(1):20603. doi: 10.1038/s41598-023-47555-1.
4
Privacy-preserving federated genome-wide association studies via dynamic sampling.通过动态采样实现保护隐私的联邦全基因组关联研究。
Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad639.
5
Genome-Wide Meta-Analysis Identifies Multiple Novel Rare Variants to Predict Common Human Infectious Diseases Risk.全基因组荟萃分析鉴定多个新的罕见变异以预测常见人类传染病风险。
Int J Mol Sci. 2023 Apr 10;24(8):7006. doi: 10.3390/ijms24087006.
6
Achieving GWAS with homomorphic encryption.利用同态加密实现 GWAS。
BMC Med Genomics. 2020 Jul 21;13(Suppl 7):90. doi: 10.1186/s12920-020-0717-y.
7
iDASH secure genome analysis competition 2018: blockchain genomic data access logging, homomorphic encryption on GWAS, and DNA segment searching.2018年iDASH安全基因组分析竞赛:区块链基因组数据访问日志记录、全基因组关联研究中的同态加密以及DNA片段搜索
BMC Med Genomics. 2020 Jul 21;13(Suppl 7):98. doi: 10.1186/s12920-020-0715-0.
8
Privacy-preserving semi-parallel logistic regression training with fully homomorphic encryption.使用全同态加密进行隐私保护的半并行逻辑回归训练。
BMC Med Genomics. 2020 Jul 21;13(Suppl 7):88. doi: 10.1186/s12920-020-0723-0.
9
Optimized homomorphic encryption solution for secure genome-wide association studies.优化的同态加密解决方案,用于安全的全基因组关联研究。
BMC Med Genomics. 2020 Jul 21;13(Suppl 7):83. doi: 10.1186/s12920-020-0719-9.
10
Privacy-preserving approximate GWAS computation based on homomorphic encryption.基于同态加密的隐私保护近似 GWAS 计算。
BMC Med Genomics. 2020 Jul 21;13(Suppl 7):77. doi: 10.1186/s12920-020-0722-1.
Nat Methods. 2011 Sep 4;8(10):833-5. doi: 10.1038/nmeth.1681.
4
MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes.MaCH:利用序列和基因型数据来估计单倍型和未观测基因型。
Genet Epidemiol. 2010 Dec;34(8):816-34. doi: 10.1002/gepi.20533.
5
Genomewide association studies and assessment of the risk of disease.全基因组关联研究与疾病风险评估
N Engl J Med. 2010 Jul 8;363(2):166-76. doi: 10.1056/NEJMra0905980.
6
ProbABEL package for genome-wide association analysis of imputed data.ProbABEL 软件包可用于分析基于孟德尔随机化的全基因组关联研究数据。
BMC Bioinformatics. 2010 Mar 16;11:134. doi: 10.1186/1471-2105-11-134.
7
Genotype imputation.基因型推算
Annu Rev Genomics Hum Genet. 2009;10:387-406. doi: 10.1146/annurev.genom.9.081307.164242.
8
GRIMP: a web- and grid-based tool for high-speed analysis of large-scale genome-wide association using imputed data.GRIMP:一个基于网络和网格的工具,用于使用已导入数据对大规模全基因组关联进行高速分析。
Bioinformatics. 2009 Oct 15;25(20):2750-2. doi: 10.1093/bioinformatics/btp497. Epub 2009 Aug 21.
9
How to interpret a genome-wide association study.如何解读全基因组关联研究。
JAMA. 2008 Mar 19;299(11):1335-44. doi: 10.1001/jama.299.11.1335.
10
PLINK: a tool set for whole-genome association and population-based linkage analyses.PLINK:一个用于全基因组关联分析和基于群体的连锁分析的工具集。
Am J Hum Genet. 2007 Sep;81(3):559-75. doi: 10.1086/519795. Epub 2007 Jul 25.