Rivier Cyprien A, Clocchiatti-Tuozzo Santiago, Huo Shufan, Torres-Lopez Victor, Renedo Daniela, Sheth Kevin N, Falcone Guido J, Acosta Julian N
Department of Neurology, Yale School of Medicine, New Haven, CT 06510, United States.
Yale Center for Brain and Mind Health, Yale School of Medicine, New Haven, CT 06510, United States.
Bioinform Adv. 2024 Dec 24;5(1):vbae207. doi: 10.1093/bioadv/vbae207. eCollection 2025.
The expansion of genetic association data from genome-wide association studies has increased the importance of methodologies like Polygenic Risk Scores (PRS) and Mendelian Randomization (MR) in genetic epidemiology. However, their application is often impeded by complex, multi-step workflows requiring specialized expertise and the use of disparate tools with varying data formatting requirements. Existing solutions are frequently standalone packages or command-line based-largely due to dependencies on tools like PLINK-limiting accessibility for researchers without computational experience. Given Python's popularity and ease of use, there is a need for an integrated, user-friendly Python toolkit to streamline PRS and MR analyses.
We introduce Genal, a Python package that consolidates SNP-level data handling, cleaning, clumping, PRS computation, and MR analyses into a single, cohesive toolkit. By eliminating the need for multiple R packages and for command-line interaction by wrapping around PLINK, Genal lowers the barrier for medical scientists to perform complex genetic epidemiology studies. Genal draws on concepts from several well-established tools, ensuring that users have access to rigorous statistical techniques in the intuitive Python environment. Additionally, Genal leverages parallel processing for MR methods, including MR-PRESSO, significantly reducing the computational time required for these analyses.
The package is available on Pypi (https://pypi.org/project/genal-python/), the code is openly available on Github with a tutorial: https://github.com/CypRiv/genal, and the documentation can be found on readthedocs: https://genal.rtfd.io.
全基因组关联研究中遗传关联数据的扩展,增加了多基因风险评分(PRS)和孟德尔随机化(MR)等方法在遗传流行病学中的重要性。然而,它们的应用常常受到复杂的多步骤工作流程的阻碍,这些流程需要专业知识,并且要使用具有不同数据格式要求的不同工具。现有的解决方案通常是独立的软件包或基于命令行的——这主要是由于对像PLINK这样的工具的依赖——限制了没有计算经验的研究人员的可及性。鉴于Python的普及性和易用性,需要一个集成的、用户友好的Python工具包来简化PRS和MR分析。
我们推出了Genal,一个Python软件包,它将单核苷酸多态性(SNP)水平的数据处理、清理、聚类、PRS计算和MR分析整合到一个统一的工具包中。通过消除对多个R软件包的需求,并通过围绕PLINK进行包装来避免命令行交互,Genal降低了医学科学家进行复杂遗传流行病学研究的门槛。Genal借鉴了几个成熟工具的概念,确保用户能够在直观的Python环境中使用严格的统计技术。此外,Genal对包括MR-PRESSO在内的MR方法利用了并行处理,显著减少了这些分析所需的计算时间。