• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CCAFE:从全基因组关联研究汇总统计数据中估计病例和对照等位基因频率

CCAFE: Estimating Case and Control Allele Frequencies from GWAS Summary Statistics.

作者信息

Stoneman Hayley R, Price Adelle, Gignoux Christopher R, Hendricks Audrey E

机构信息

Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.

Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.

出版信息

bioRxiv. 2024 Oct 29:2024.10.24.619530. doi: 10.1101/2024.10.24.619530.

DOI:10.1101/2024.10.24.619530
PMID:39554201
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11565872/
Abstract

Methods involving summary statistics in genetics can be quite powerful but can be limited in utility. For instance, many post-hoc analyses of disease studies require case and control allele frequencies (AFs), which are not always published. We present two frameworks to derive case and control AFs from GWAS summary statistics using the odds ratio, case and control sample sizes, and either the total (case and control aggregated) AF or standard error (SE). In simulations and real data, derivations of case and controls AFs using total AF is highly accurate across all settings (e.g., minor AF, condition prevalence). Conversely, derivations using SE underestimate common variant AFs (e.g. minor allele frequency >0.3) in the presence of covariates. We develop an adjustment using gnomAD AFs as a proxy for true AFs, which reduces the bias when using SE. While estimating case and control AFs using the total AF is preferred due to its high accuracy, estimating from the SE can be used more broadly since SE can be derived from p-values and beta estimates, which are commonly provided. The methods provided here expand the utility of publicly available genetic summary statistics and promote the reusability of genomic data. The R package with implementations of both methods, is freely available on Bioconductor and GitHub.

摘要

遗传学中涉及汇总统计的方法可能非常强大,但实用性可能有限。例如,许多疾病研究的事后分析需要病例组和对照组的等位基因频率(AF),而这些频率并不总是会公布。我们提出了两个框架,可利用优势比、病例组和对照组样本量以及总(病例组和对照组汇总)AF或标准误(SE),从全基因组关联研究(GWAS)汇总统计数据中推导病例组和对照组的AF。在模拟和实际数据中,使用总AF推导病例组和对照组的AF在所有情况下(例如,次要等位基因频率、疾病患病率)都非常准确。相反,在存在协变量的情况下,使用SE进行推导会低估常见变异的AF(例如,次要等位基因频率>0.3)。我们开发了一种使用gnomAD AF作为真实AF的替代指标的调整方法,该方法可减少使用SE时的偏差。虽然由于其高准确性,使用总AF估计病例组和对照组的AF是首选,但由于SE可以从p值和β估计值推导得出(这些通常都会提供),因此从SE进行估计可以更广泛地使用。本文提供的方法扩展了公开可用的遗传汇总统计数据的实用性,并促进了基因组数据的可重用性。包含这两种方法实现的R包可在Bioconductor和GitHub上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e47/11565872/b8ca661bf90d/nihpp-2024.10.24.619530v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e47/11565872/508f976a13b3/nihpp-2024.10.24.619530v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e47/11565872/9790cbc29d13/nihpp-2024.10.24.619530v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e47/11565872/b8ca661bf90d/nihpp-2024.10.24.619530v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e47/11565872/508f976a13b3/nihpp-2024.10.24.619530v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e47/11565872/9790cbc29d13/nihpp-2024.10.24.619530v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e47/11565872/b8ca661bf90d/nihpp-2024.10.24.619530v1-f0003.jpg

相似文献

1
CCAFE: Estimating Case and Control Allele Frequencies from GWAS Summary Statistics.CCAFE:从全基因组关联研究汇总统计数据中估计病例和对照等位基因频率
bioRxiv. 2024 Oct 29:2024.10.24.619530. doi: 10.1101/2024.10.24.619530.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
ZMIX: estimating ancestry proportions using GWAS association Z-scores.ZMIX:使用全基因组关联研究(GWAS)关联Z分数估计祖先比例。
Bioinform Adv. 2024 Aug 29;4(1):vbae128. doi: 10.1093/bioadv/vbae128. eCollection 2024.
4
Summix: A method for detecting and adjusting for population structure in genetic summary data.Summix:一种用于检测和调整遗传汇总数据中群体结构的方法。
Am J Hum Genet. 2021 Jul 1;108(7):1270-1282. doi: 10.1016/j.ajhg.2021.05.016. Epub 2021 Jun 21.
5
Powerful and efficient SNP-set association tests across multiple phenotypes using GWAS summary data.利用 GWAS 汇总数据对多种表型进行强大且高效的 SNP 集关联测试。
Bioinformatics. 2019 Apr 15;35(8):1366-1372. doi: 10.1093/bioinformatics/bty811.
6
SumVg: Total Heritability Explained by All Variants in Genome-Wide Association Studies Based on Summary Statistics with Standard Error Estimates.SumVg:基于具有标准误差估计的汇总统计数据的全基因组关联研究中所有变异解释的总遗传力。
Int J Mol Sci. 2024 Jan 22;25(2):1347. doi: 10.3390/ijms25021347.
7
Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores.识别并校正全基因组关联研究汇总统计数据和多基因评分中的错误设定。
HGG Adv. 2022 Aug 18;3(4):100136. doi: 10.1016/j.xhgg.2022.100136. eCollection 2022 Oct 13.
8
Reconstructing SNP allele and genotype frequencies from GWAS summary statistics.从 GWAS 汇总统计数据中重建 SNP 等位基因和基因型频率。
Sci Rep. 2022 May 17;12(1):8242. doi: 10.1038/s41598-022-12185-6.
9
Leveraging Large-Scale Genetics of PTSD and Cardiovascular Disease to Demonstrate Robust Shared Risk and Improve Risk Prediction Accuracy.利用 PTSD 和心血管疾病的大规模遗传学研究来证明稳健的共同风险,并提高风险预测准确性。
Am J Psychiatry. 2022 Nov 1;179(11):814-823. doi: 10.1176/appi.ajp.21111113. Epub 2022 Sep 7.
10
The non-equilibrium allele frequency spectrum in a Poisson random field framework.泊松随机场框架下的非平衡等位基因频率谱。
Theor Popul Biol. 2016 Oct;111:51-64. doi: 10.1016/j.tpb.2016.06.003. Epub 2016 Jul 1.

本文引用的文献

1
Characterizing substructure via mixture modeling in large-scale genetic summary statistics.通过混合模型在大规模遗传汇总统计中表征子结构。
Am J Hum Genet. 2025 Feb 6;112(2):235-253. doi: 10.1016/j.ajhg.2024.12.007. Epub 2025 Jan 16.
2
Mendelian randomization.孟德尔随机化
Nat Rev Methods Primers. 2022 Feb 10;2. doi: 10.1038/s43586-021-00092-5.
3
SumStatsRehab: an efficient algorithm for GWAS summary statistics assessment and restoration.SumStatsRehab:一种用于 GWAS 汇总统计评估和恢复的高效算法。
BMC Bioinformatics. 2022 Oct 25;23(1):443. doi: 10.1186/s12859-022-04920-7.
4
Opportunities and challenges for the use of common controls in sequencing studies.测序研究中使用常见对照的机遇和挑战。
Nat Rev Genet. 2022 Nov;23(11):665-679. doi: 10.1038/s41576-022-00487-4. Epub 2022 May 17.
5
Reconstructing SNP allele and genotype frequencies from GWAS summary statistics.从 GWAS 汇总统计数据中重建 SNP 等位基因和基因型频率。
Sci Rep. 2022 May 17;12(1):8242. doi: 10.1038/s41598-022-12185-6.
6
Polygenic scores in biomedical research.多基因评分在生物医学研究中的应用。
Nat Rev Genet. 2022 Sep;23(9):524-532. doi: 10.1038/s41576-022-00470-z. Epub 2022 Mar 30.
7
MungeSumstats: a Bioconductor package for the standardization and quality control of many GWAS summary statistics.MungeSumstats:一个 Bioconductor 软件包,用于对许多 GWAS 汇总统计数据进行标准化和质量控制。
Bioinformatics. 2021 Dec 7;37(23):4593-4596. doi: 10.1093/bioinformatics/btab665.
8
The variant call format provides efficient and robust storage of GWAS summary statistics.变异调用格式为 GWAS 汇总统计数据提供了高效、强大的存储方式。
Genome Biol. 2021 Jan 13;22(1):32. doi: 10.1186/s13059-020-02248-0.
9
The mutational constraint spectrum quantified from variation in 141,456 humans.从 141456 名人类个体的变异中量化的突变约束谱。
Nature. 2020 May;581(7809):434-443. doi: 10.1038/s41586-020-2308-7. Epub 2020 May 27.
10
Is useful research data usually shared? An investigation of genome-wide association study summary statistics.有用的研究数据通常会被分享吗?对全基因组关联研究汇总统计数据的调查。
PLoS One. 2020 Feb 21;15(2):e0229578. doi: 10.1371/journal.pone.0229578. eCollection 2020.