Department of General Biology, School of Medicine, University of Patras, Patras, Greece.
Metabolic Engineering and Systems Biology Laboratory, Institute of Chemical Engineering Sciences, Foundation for Research and Technology-Hellas (FORTH/ICE-HT), Patras, Greece.
Hum Genomics. 2024 Feb 8;18(1):15. doi: 10.1186/s40246-023-00565-6.
It is valuable to analyze the genome-wide association studies (GWAS) data for a complex disease phenotype in the context of the protein-protein interaction (PPI) network, as the related pathophysiology results from the function of interacting polyprotein pathways. The analysis may include the design and curation of a phenotype-specific GWAS meta-database incorporating genotypic and eQTL data linking to PPI and other biological datasets, and the development of systematic workflows for PPI network-based data integration toward protein and pathway prioritization. Here, we pursued this analysis for blood pressure (BP) regulation.
The relational scheme of the implemented in Microsoft SQL Server BP-GWAS meta-database enabled the combined storage of: GWAS data and attributes mined from GWAS Catalog and the literature, Ensembl-defined SNP-transcript associations, and GTEx eQTL data. The BP-protein interactome was reconstructed from the PICKLE PPI meta-database, extending the GWAS-deduced network with the shortest paths connecting all GWAS-proteins into one component. The shortest-path intermediates were considered as BP-related. For protein prioritization, we combined a new integrated GWAS-based scoring scheme with two network-based criteria: one considering the protein role in the reconstructed by shortest-path (RbSP) interactome and one novel promoting the common neighbors of GWAS-prioritized proteins. Prioritized proteins were ranked by the number of satisfied criteria.
The meta-database includes 6687 variants linked with 1167 BP-associated protein-coding genes. The GWAS-deduced PPI network includes 1065 proteins, with 672 forming a connected component. The RbSP interactome contains 1443 additional, network-deduced proteins and indicated that essentially all BP-GWAS proteins are at most second neighbors. The prioritized BP-protein set was derived from the union of the most BP-significant by any of the GWAS-based or the network-based criteria. It included 335 proteins, with ~ 2/3 deduced from the BP PPI network extension and 126 prioritized by at least two criteria. ESR1 was the only protein satisfying all three criteria, followed in the top-10 by INSR, PTN11, CDK6, CSK, NOS3, SH2B3, ATP2B1, FES and FINC, satisfying two. Pathway analysis of the RbSP interactome revealed numerous bioprocesses, which are indeed functionally supported as BP-associated, extending our understanding about BP regulation.
The implemented workflow could be used for other multifactorial diseases.
在蛋白质-蛋白质相互作用(PPI)网络的背景下分析复杂疾病表型的全基因组关联研究(GWAS)数据是有价值的,因为相关的病理生理学结果来自相互作用的多蛋白途径的功能。该分析可能包括设计和管理一个特定于表型的 GWAS 元数据库,该数据库结合了基因型和 eQTL 数据,与 PPI 和其他生物数据集相关联,并开发系统的 PPI 网络数据集成工作流程,以实现蛋白质和途径的优先级排序。在这里,我们针对血压(BP)调节进行了此项分析。
在 Microsoft SQL Server 中实现的 BP-GWAS 元数据库的关系方案使 GWAS 数据和从 GWAS Catalog 和文献中挖掘的属性、Ensembl 定义的 SNP-转录本关联以及 GTEx eQTL 数据能够联合存储。从 PICKLE PPI 元数据库中重建了 BP 蛋白互作组,通过将连接所有 GWAS-蛋白的最短路径扩展到一个组件中,扩展了 GWAS 推断的网络。最短路径中间物被认为与 BP 相关。为了进行蛋白质优先级排序,我们结合了一种新的基于综合 GWAS 的评分方案和两个基于网络的标准:一个考虑在最短路径(RbSP)互作组中重建的蛋白质的作用,另一个新的标准促进 GWAS 优先化蛋白质的共同邻居。根据满足标准的数量对优先化的蛋白质进行排序。
元数据库包括 6687 个与 1167 个与 BP 相关的蛋白编码基因相关联的变体。GWAS 推断的 PPI 网络包含 1065 种蛋白质,其中 672 种形成一个连通组件。RbSP 互作组包含 1443 个额外的、网络推断的蛋白质,并表明基本上所有的 BP-GWAS 蛋白质最多都是第二邻居。优先化的 BP-蛋白集源自任何基于 GWAS 或基于网络的标准中最具 BP 意义的标准的并集。它包含 335 种蛋白质,其中约 2/3 是从 BP PPI 网络扩展中推断出来的,有 126 种是通过至少两个标准优先化的。ESR1 是唯一满足所有三个标准的蛋白质,其次是 INSR、PTN11、CDK6、CSK、NOS3、SH2B3、ATP2B1、FES 和 FINC,满足两个标准。对 RbSP 互作组的途径分析揭示了许多生物过程,这些过程确实与 BP 相关,这扩展了我们对 BP 调节的理解。
实施的工作流程可用于其他多因素疾病。