Suppr超能文献

简化全基因组关联研究(GWAS)位点中的因果基因识别

Simplifying causal gene identification in GWAS loci.

作者信息

Schipper Marijn, Ulirsch Jacob, Posthuma Danielle, Ripke Stephan, Heilbron Karl

机构信息

Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.

Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.

出版信息

medRxiv. 2025 Jan 25:2024.07.26.24311057. doi: 10.1101/2024.07.26.24311057.

Abstract

Genome-wide association studies (GWAS) help to identify disease-linked genetic variants, but pinpointing the most likely causal genes in GWAS loci remains challenging. Existing GWAS gene prioritization tools are powerful but often use complex black box models trained on datasets containing unaddressed biases. Here, we use a data-driven approach to construct a truth set of causal genes in 406 GWAS loci. We train a gene prioritization tool, CALDERA, that uses a simple logistic regression model with L1 regularization and corrects for potential confounders. Using three independent benchmarking datasets of resolved GWAS loci, we compare the performance of CALDERA with three other methods (FLAMES, L2G, and cS2G). CALDERA outperforms all these methods in two out of three datasets and ranks second in the remaining dataset. We demonstrate that CALDERA prioritizes genes with expected properties, such as mutation intolerance (OR = 1.751 for pLI > 90%, P = 8.45×10). Overall, CALDERA provides a powerful solution for prioritizing potentially causal genes in GWAS loci and may help identify novel genetics-driven drug targets.

摘要

全基因组关联研究(GWAS)有助于识别与疾病相关的基因变异,但在GWAS位点中确定最可能的致病基因仍然具有挑战性。现有的GWAS基因优先级排序工具功能强大,但通常使用基于包含未解决偏差的数据集训练的复杂黑箱模型。在这里,我们采用数据驱动的方法在406个GWAS位点构建了一个致病基因真值集。我们训练了一种基因优先级排序工具CALDERA,它使用带有L1正则化的简单逻辑回归模型,并对潜在的混杂因素进行校正。使用三个已解析的GWAS位点的独立基准数据集,我们将CALDERA与其他三种方法(FLAMES、L2G和cS2G)的性能进行了比较。在三个数据集中,CALDERA在其中两个数据集中的表现优于所有这些方法,在其余数据集中排名第二。我们证明,CALDERA对具有预期特性的基因进行了优先级排序,例如突变不耐受性(对于pLI>90%,OR = 1.751,P = 8.45×10)。总体而言,CALDERA为在GWAS位点中对潜在致病基因进行优先级排序提供了一个强大的解决方案,并可能有助于识别新的遗传驱动药物靶点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9bd0/11867536/1fe76a1f38ff/nihpp-2024.07.26.24311057v2-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验