Empirical Inference, Max-Planck Institute for Intelligent Systems, Tübingen 72076, Germany.
Université Paris Cité, INSERM, UMRS-1124, Paris F-75006, France.
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae014.
Genome-wide association studies (GWAS) have enabled large-scale analysis of the role of genetic variants in human disease. Despite impressive methodological advances, subsequent clinical interpretation and application remains challenging when GWAS suffer from a lack of statistical power. In recent years, however, the use of information diffusion algorithms with molecular networks has led to fruitful insights on disease genes.
We present an overview of the design choices and pitfalls that prove crucial in the application of network propagation methods to GWAS summary statistics. We highlight general trends from the literature, and present benchmark experiments to expand on these insights selecting as case study three diseases and five molecular networks. We verify that the use of gene-level scores based on GWAS P-values offers advantages over the selection of a set of 'seed' disease genes not weighted by the associated P-values if the GWAS summary statistics are of sufficient quality. Beyond that, the size and the density of the networks prove to be important factors for consideration. Finally, we explore several ensemble methods and show that combining multiple networks may improve the network propagation approach.
全基因组关联研究 (GWAS) 使对遗传变异在人类疾病中的作用进行大规模分析成为可能。尽管在方法学上取得了令人瞩目的进展,但当 GWAS 缺乏统计效力时,后续的临床解释和应用仍然具有挑战性。然而,近年来,利用分子网络中的信息扩散算法,对疾病基因有了富有成效的认识。
我们概述了在将网络传播方法应用于 GWAS 汇总统计数据时证明至关重要的设计选择和陷阱。我们从文献中强调了一般趋势,并进行了基准实验,以扩展这些见解,选择三种疾病和五个分子网络作为案例研究。我们验证了,如果 GWAS 汇总统计数据的质量足够好,基于 GWAS P 值的基因级评分的使用优于不根据相关 P 值加权选择一组“种子”疾病基因的选择。除此之外,网络的大小和密度被证明是需要考虑的重要因素。最后,我们探索了几种集成方法,并表明结合多个网络可以改进网络传播方法。