Tak Yu Gyoung, Farnham Peggy J
Department of Biochemistry and Molecular Biology, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089 USA.
Epigenetics Chromatin. 2015 Dec 30;8:57. doi: 10.1186/s13072-015-0050-4. eCollection 2015.
Considerable progress towards an understanding of complex diseases has been made in recent years due to the development of high-throughput genotyping technologies. Using microarrays that contain millions of single-nucleotide polymorphisms (SNPs), Genome Wide Association Studies (GWASs) have identified SNPs that are associated with many complex diseases or traits. For example, as of February 2015, 2111 association studies have identified 15,396 SNPs for various diseases and traits, with the number of identified SNP-disease/trait associations increasing rapidly in recent years. However, it has been difficult for researchers to understand disease risk from GWAS results. This is because most GWAS-identified SNPs are located in non-coding regions of the genome. It is important to consider that the GWAS-identified SNPs serve only as representatives for all SNPs in the same haplotype block, and it is equally likely that other SNPs in high linkage disequilibrium (LD) with the array-identified SNPs are causal for the disease. Because it was hoped that disease-associated coding variants would be identified if the true casual SNPs were known, investigators have expanded their analyses using LD calculation and fine-mapping. However, such analyses also identified risk-associated SNPs located in non-coding regions. Thus, the GWAS field has been left with the conundrum as to how a single-nucleotide change in a non-coding region could confer increased risk for a specific disease. One possible answer to this puzzle is that the variant SNPs cause changes in gene expression levels rather than causing changes in protein function. This review provides a description of (1) advances in genomic and epigenomic approaches that incorporate functional annotation of regulatory elements to prioritize the disease risk-associated SNPs that are located in non-coding regions of the genome for follow-up studies, (2) various computational tools that aid in identifying gene expression changes caused by the non-coding disease-associated SNPs, and (3) experimental approaches to identify target genes of, and study the biological phenotypes conferred by, non-coding disease-associated SNPs.
近年来,由于高通量基因分型技术的发展,在理解复杂疾病方面取得了相当大的进展。通过使用包含数百万个单核苷酸多态性(SNP)的微阵列,全基因组关联研究(GWAS)已经鉴定出与许多复杂疾病或性状相关的SNP。例如,截至2015年2月,2111项关联研究已经鉴定出15396个与各种疾病和性状相关的SNP,近年来鉴定出的SNP-疾病/性状关联数量迅速增加。然而,研究人员很难从GWAS结果中理解疾病风险。这是因为大多数GWAS鉴定出的SNP位于基因组的非编码区域。需要考虑的是,GWAS鉴定出的SNP仅作为同一单倍型块中所有SNP的代表,与阵列鉴定出的SNP处于高连锁不平衡(LD)状态的其他SNP同样有可能是该疾病的致病因素。由于人们希望如果知道真正的致病SNP,就能够鉴定出与疾病相关的编码变异,研究人员已经使用LD计算和精细定位扩展了他们的分析。然而,这些分析也鉴定出了位于非编码区域的风险相关SNP。因此,GWAS领域面临着一个难题,即非编码区域的单核苷酸变化如何能够增加特定疾病的风险。这个谜题的一个可能答案是,变异的SNP导致基因表达水平的变化,而不是导致蛋白质功能的变化。本综述描述了:(1)基因组和表观基因组方法的进展,这些方法结合了调控元件的功能注释,以便对位于基因组非编码区域的疾病风险相关SNP进行优先级排序,用于后续研究;(2)各种有助于识别由非编码疾病相关SNP引起的基因表达变化的计算工具;(3)识别非编码疾病相关SNP的靶基因并研究其赋予的生物学表型的实验方法。
Hum Reprod. 2016-3-22
Vavilovskii Zhurnal Genet Selektsii. 2023-10
Mol Biol Evol. 2025-7-30
bioRxiv. 2025-7-3
Mol Ther Nucleic Acids. 2025-4-2
Crit Rev Biochem Mol Biol. 2015
Nature. 2015-10-1
Nat Methods. 2015-10
N Engl J Med. 2015-9-3
Cell. 2015-8-13