Li Binglan, Ritchie Marylyn D
Department of Biomedical Data Science, Stanford University, Stanford, CA, United States.
Department of Genetics, University of Pennsylvania, Philadelphia, PA, United States.
Front Genet. 2021 Sep 30;12:713230. doi: 10.3389/fgene.2021.713230. eCollection 2021.
Since their inception, genome-wide association studies (GWAS) have identified more than a hundred thousand single nucleotide polymorphism (SNP) loci that are associated with various complex human diseases or traits. The majority of GWAS discoveries are located in non-coding regions of the human genome and have unknown functions. The valley between non-coding GWAS discoveries and downstream affected genes hinders the investigation of complex disease mechanism and the utilization of human genetics for the improvement of clinical care. Meanwhile, advances in high-throughput sequencing technologies reveal important genomic regulatory roles that non-coding regions play in the transcriptional activities of genes. In this review, we focus on data integrative bioinformatics methods that combine GWAS with functional genomics knowledge to identify genetically regulated genes. We categorize and describe two types of data integrative methods. First, we describe fine-mapping methods. Fine-mapping is an exploratory approach that calibrates likely causal variants underneath GWAS signals. Fine-mapping methods connect GWAS signals to potentially causal genes through statistical methods and/or functional annotations. Second, we discuss gene-prioritization methods. These are hypothesis generating approaches that evaluate whether genetic variants regulate genes via certain genetic regulatory mechanisms to influence complex traits, including colocalization, mendelian randomization, and the transcriptome-wide association study (TWAS). TWAS is a gene-based association approach that investigates associations between genetically regulated gene expression and complex diseases or traits. TWAS has gained popularity over the years due to its ability to reduce multiple testing burden in comparison to other variant-based analytic approaches. Multiple types of TWAS methods have been developed with varied methodological designs and biological hypotheses over the past 5 years. We dive into discussions of how TWAS methods differ in many aspects and the challenges that different TWAS methods face. Overall, TWAS is a powerful tool for identifying complex trait-associated genes. With the advent of single-cell sequencing, chromosome conformation capture, gene editing technologies, and multiplexing reporter assays, we are expecting a more comprehensive understanding of genomic regulation and genetically regulated genes underlying complex human diseases and traits in the future.
自全基因组关联研究(GWAS)开展以来,已经鉴定出超过十万个与各种复杂人类疾病或性状相关的单核苷酸多态性(SNP)位点。大多数GWAS发现位于人类基因组的非编码区域,其功能尚不清楚。非编码GWAS发现与下游受影响基因之间的差距阻碍了对复杂疾病机制的研究以及利用人类遗传学改善临床护理。与此同时,高通量测序技术的进展揭示了非编码区域在基因转录活动中发挥的重要基因组调控作用。在这篇综述中,我们重点关注将GWAS与功能基因组学知识相结合以鉴定基因调控基因的数据整合生物信息学方法。我们对两种类型的数据整合方法进行分类和描述。首先,我们描述精细定位方法。精细定位是一种探索性方法,用于校准GWAS信号下可能的因果变异。精细定位方法通过统计方法和/或功能注释将GWAS信号与潜在的因果基因联系起来。其次,我们讨论基因优先排序方法。这些是假设生成方法,评估遗传变异是否通过某些遗传调控机制调节基因以影响复杂性状,包括共定位、孟德尔随机化和全转录组关联研究(TWAS)。TWAS是一种基于基因的关联方法,研究基因调控的基因表达与复杂疾病或性状之间的关联。由于与其他基于变异的分析方法相比,TWAS能够减轻多重检验负担,因此多年来它越来越受欢迎。在过去5年中,已经开发了多种类型的TWAS方法,其方法设计和生物学假设各不相同。我们深入讨论TWAS方法在许多方面的差异以及不同TWAS方法面临的挑战。总体而言,TWAS是鉴定复杂性状相关基因的强大工具。随着单细胞测序、染色体构象捕获、基因编辑技术和多重报告基因检测的出现,我们期待未来能更全面地了解复杂人类疾病和性状背后的基因组调控和基因调控基因。