Suppr超能文献

用于后续研究的常见疾病遗传风险变异优先级排序的贝叶斯效应大小排名

Bayesian Effect Size Ranking to Prioritise Genetic Risk Variants in Common Diseases for Follow-Up Studies.

作者信息

Crouch Daniel J M, Inshaw Jamie R J, Robertson Catherine C, Ng Esther, Zhang Jia-Yuan, Chen Wei-Min, Onengut-Gumuscu Suna, Cutler Antony J, Sidore Carlo, Cucca Francesco, Pociot Flemming, Concannon Patrick, Rich Stephen S, Todd John A

机构信息

JDRF/Wellcome Diabetes and Inflammation Laboratory, Nuffield Department of Medicine, Centre for Human Genetics, NIHR Oxford Biomedical Research Centre, University of Oxford, Oxford, UK.

Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, USA.

出版信息

Genet Epidemiol. 2025 Jan;49(1):e22608. doi: 10.1002/gepi.22608.

Abstract

Biological datasets often consist of thousands or millions of variables, e.g. genetic variants or biomarkers, and when sample sizes are large it is common to find many associated with an outcome of interest, for example, disease risk in a GWAS, at high levels of statistical significance, but with very small effects. The False Discovery Rate (FDR) is used to identify effects of interest based on ranking variables according to their statistical significance. Here, we develop a complementary measure to the FDR, the priorityFDR, that ranks variables by a combination of effect size and significance, allowing further prioritisation among a set of variables that pass a significance or FDR threshold. Applying to the largest GWAS of type 1 diabetes to date (15,573 cases and 158,408 controls), we identified 26 independent genetic associations, including two newly-reported loci, with qualitatively lower priorityFDRs than the remaining 175 signals. We detected putatively causal type 1 diabetes risk genes using Mendelian Randomisation, and found that these were located disproportionately close to low priorityFDR signals (p = 0.005), as were genes in the IL-2 pathway (p = 0.003). Selecting variables on both effect size and significance can lead to improved prioritisation for mechanistic follow-up studies from genetic and other large biological datasets.

摘要

生物数据集通常包含成千上万或数百万个变量,例如基因变异或生物标志物。当样本量很大时,常常会发现许多与感兴趣的结果相关的变量,例如在全基因组关联研究(GWAS)中与疾病风险相关的变量,这些变量具有很高的统计显著性,但效应非常小。错误发现率(FDR)用于根据变量的统计显著性对其进行排序,从而识别感兴趣的效应。在此,我们开发了一种FDR的补充指标——优先FDR,它通过效应大小和显著性的组合对变量进行排序,从而能够在一组通过显著性或FDR阈值的变量中进一步进行优先级排序。将其应用于迄今为止最大的1型糖尿病GWAS(15573例病例和158408例对照),我们确定了26个独立的基因关联,包括两个新报道的基因座,其优先FDR在质量上低于其余175个信号。我们使用孟德尔随机化检测了可能导致1型糖尿病风险的基因,发现这些基因不成比例地靠近低优先FDR信号(p = 0.005),白细胞介素-2途径中的基因也是如此(p = 0.003)。在效应大小和显著性两方面选择变量,可以提高从遗传和其他大型生物数据集中进行机制后续研究的优先级排序。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f13/11696485/fce611b20186/GEPI-49-0-g002.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验