Zhang Zheyu, Zhou Jintong, Cao Tianze, Huang Yuexia, Huang Chu, Xia Yu
School of Mathematics, Hangzhou Normal University, Hangzhou, Zhejiang, China.
PeerJ. 2025 Feb 12;13:e18985. doi: 10.7717/peerj.18985. eCollection 2025.
The Polygenic Score (PGS) Catalog is a public database dedicated to storing polygenic risk scores. To date, the database has included 5,022 polygenic risk scores associated with 656 different traits. Although the PGS Catalog offers an official resource representational state transfer (REST) application programming interface (API), there is no ready-made data client tailored for any specific programming language. Researchers are thus required to invest time in becoming familiar with the structure of the REST API and to implement a corresponding client in their programming language of choice to integrate PGS data into their analytical workflows.
In this work we introduce pandasPGS, a Python package that provides programmatic access to PGS Catalog data. After being called by the researcher, pandasPGS will automatically select the appropriate uniform resource locator (URL) and request the data based on the name and parameters of the called function, and merge the obtained pagination data. In addition, pandasPGS also provides further data pre-processing functions. According to the structure of the obtained data, it can convert the data into several hierarchical pandas.DataFrame objects, which is convenient for further analysis by researchers.
This tool allows researchers to easily analyze PGS Catalog data using Python. It alleviates the time cost for researchers to learn the REST APIs of PGS Catalog. The source codes can be found in https://github.com/tianzelab/pandaspgs, and the API documentations can be found in https://tianzelab.github.io/pandaspgs/.
多基因评分(PGS)目录是一个专门用于存储多基因风险评分的公共数据库。截至目前,该数据库已包含与656种不同性状相关的5022个多基因风险评分。尽管PGS目录提供了官方的资源表示状态转移(REST)应用程序编程接口(API),但没有为任何特定编程语言量身定制的现成数据客户端。因此,研究人员需要花费时间熟悉REST API的结构,并在他们选择的编程语言中实现相应的客户端,以便将PGS数据集成到他们的分析工作流程中。
在这项工作中,我们引入了pandasPGS,这是一个Python包,它提供了对PGS目录数据的编程访问。在被研究人员调用后,pandasPGS将自动选择合适的统一资源定位符(URL),并根据被调用函数的名称和参数请求数据,并合并获得的分页数据。此外,pandasPGS还提供了进一步的数据预处理功能。根据获得的数据结构,它可以将数据转换为几个分层的pandas.DataFrame对象,方便研究人员进一步分析。
这个工具使研究人员能够使用Python轻松分析PGS目录数据。它减轻了研究人员学习PGS目录REST API的时间成本。源代码可在https://github.com/tianzelab/pandaspgs中找到,API文档可在https://tianzelab.github.io/pandaspgs/中找到。