Cerezo Maria, Sollis Elliot, Ji Yue, Lewis Elizabeth, Abid Ala, Bircan Karatuğ Ozan, Hall Peggy, Hayhurst James, John Sajo, Mosaku Abayomi, Ramachandran Santhi, Foreman Amy, Ibrahim Arwa, McLaughlin James, Pendlington Zoë, Stefancsik Ray, Lambert Samuel A, McMahon Aoife, Morales Joannella, Keane Thomas, Inouye Michael, Parkinson Helen, Harris Laura W
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
Division of Genomic Medicine, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.
bioRxiv. 2024 Oct 23:2024.10.23.619767. doi: 10.1101/2024.10.23.619767.
The NHGRI-EBI GWAS Catalog serves as a vital resource for the genetic research community, providing access to the most comprehensive database of human GWAS results. Currently, it contains close to 7,000 publications for more than 15,000 traits, from which more than 625,000 lead associations have been curated. Additionally, 85,000 full genome-wide summary statistics datasets - containing association data for all variants in the analysis - are available for downstream analyses such as meta-analysis, fine-mapping, Mendelian randomisation or development of polygenic risk scores. As a centralised repository for GWAS results, the GWAS Catalog sets and implements standards for data submission and harmonisation, and encourages the use of consistent descriptors for traits, samples and methodologies. We share processes and vocabulary with the PGS Catalog, improving interoperability for a growing user group. Here, we describe the latest changes in data content, improvements in our user interface, and the implementation of the GWAS-SSF standard format for summary statistics. We address the challenges of handling the rapid increase in large-scale molecular quantitative trait GWAS and the need for sensitivity in the use of population and cohort descriptors while maintaining data interoperability and reusability.
NHGRI-EBI全基因组关联研究(GWAS)目录是遗传研究界的重要资源,可提供对最全面的人类GWAS结果数据库的访问。目前,它包含近7000篇关于15000多个性状的出版物,从中整理出了超过62.5万个主要关联。此外,还有85000个全基因组汇总统计数据集——包含分析中所有变异的关联数据——可用于下游分析,如荟萃分析、精细定位、孟德尔随机化或多基因风险评分的开发。作为GWAS结果的集中存储库,GWAS目录制定并实施数据提交和协调标准,并鼓励对性状、样本和方法使用一致的描述符。我们与多基因风险评分(PGS)目录共享流程和词汇,提高了不断增长的用户群体的互操作性。在此,我们描述了数据内容的最新变化、用户界面的改进以及汇总统计的GWAS-SSF标准格式的实施。我们应对了处理大规模分子定量性状GWAS快速增长的挑战,以及在保持数据互操作性和可重用性的同时,在使用人群和队列描述符时需要保持敏感性的问题。