Manios Georgios A, Michailidi Aikaterini, Kontou Panagiota I, Bagos Pantelis G
Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece.
Department of Mathematics, University of Thessaly, 35131, Lamia, Greece.
BMC Bioinformatics. 2025 Apr 16;26(1):107. doi: 10.1186/s12859-025-06119-y.
Genome-wide association studies have identified connections between genetic variations and diseases, but they only examine a small portion of single nucleotide polymorphisms. To enhance genetic findings, researchers suggest imputing genotypes for unmeasured SNPs to improve coverage and statistical power. When this is not possible, summary statistics imputation can be used as an alternative. The available summary statistics imputation tools rely on reference panels, such as the 1000 Genomes Project, to estimate linkage disequilibrium (LD) between variants for accurate imputation. Tools like FAPI and SSIMP use these reference panels in variant call format (VCF) for this purpose, though this process can be time-consuming. A more effective approach for processing reference panels in summary statistics imputation was proposed in RAISS. In this approach, the LD among the variants is precomputed from the reference panel, prior to imputation, thereby reducing computational time.
We present PRED-LD, an imputation method for GWAS summary statistics that aims to enhance the resolution of genetic association analyses. The proposed method uses precomputed linkage disequilibrium statistics from HapMap, Pheno Scanner and TOP-LD to impute summary statistics, given beta coefficients and standard errors. The single-point approach that we describe provides a fast and accurate way to estimate associations for untyped single nucleotide polymorphisms that exhibit high linkage disequilibrium (LD). The proposed method is faster, provides accurate imputation compared to existing tools, and has been implemented in both a web service ( https://compgen.dib.uth.gr/PRED-LD/ ) and a command-line tool ( https://github.com/pbagos/PRED-LD ), making it a useful resource for the research community.
PRED-LD offers an efficient and accurate method for GWAS summary statistics imputation, providing faster performance, direct result interpretation, and the ability to use multiple reference panels. Also, the online version of PRED-LD simplifies obtaining LD information and performing imputation tasks without downloading reference panels and will be continuously updated to support tools for meta-analysis and fine-mapping in GWAS.
全基因组关联研究已确定了基因变异与疾病之间的联系,但它们仅检测了一小部分单核苷酸多态性。为了增强基因研究结果,研究人员建议对未测量的单核苷酸多态性进行基因型填充,以提高覆盖范围和统计功效。当无法进行这种填充时,汇总统计量填充可作为一种替代方法。现有的汇总统计量填充工具依赖于参考面板,如千人基因组计划,来估计变异之间的连锁不平衡(LD),以进行准确的填充。像FAPI和SSIMP这样的工具为此使用变体调用格式(VCF)的这些参考面板,不过这个过程可能很耗时。RAISS中提出了一种在汇总统计量填充中处理参考面板的更有效方法。在这种方法中,在填充之前从参考面板预先计算变异之间的LD,从而减少计算时间。
我们提出了PRED-LD,一种用于全基因组关联研究汇总统计量的填充方法,旨在提高基因关联分析的分辨率。所提出的方法使用来自HapMap、Pheno Scanner和TOP-LD的预先计算的连锁不平衡统计量,在给定β系数和标准误的情况下对汇总统计量进行填充。我们描述的单点方法提供了一种快速准确的方法来估计与表现出高连锁不平衡(LD)的未分型单核苷酸多态性的关联。所提出的方法更快,与现有工具相比提供了准确的填充,并且已在网络服务(https://compgen.dib.uth.gr/PRED-LD/)和命令行工具(https://github.com/pbagos/PRED-LD)中实现,使其成为研究社区的有用资源。
PRED-LD为全基因组关联研究汇总统计量填充提供了一种高效准确的方法,具有更快的性能、直接的结果解释以及使用多个参考面板的能力。此外,PRED-LD的在线版本简化了获取LD信息和执行填充任务的过程,无需下载参考面板,并将不断更新以支持全基因组关联研究中的荟萃分析和精细定位工具。