Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Sichuan Province and Ministry of Education, Southwest Minzu University, Chengdu, Sichuan, China.
Department of Agronomy, Iowa State University, Ames, IA, USA.
Methods Mol Biol. 2022;2481:63-80. doi: 10.1007/978-1-0716-2237-7_5.
With increasing marker density, estimation of recombination rate between a marker and a causal mutation using linkage analysis becomes less important. Instead, linkage disequilibrium (LD) becomes the major indicator for gene mapping through genome-wide association studies (GWAS). In addition to the linkage between the marker and the causal mutation, many other factors may contribute to the LD, including population structure and cryptic relationships among individuals. As statistical methods and software evolve to improve statistical power and computing speed in GWAS, the corresponding outputs must also evolve to facilitate the interpretation of input data, the analytical process, and final association results. In this chapter, our descriptions focus on (1) considerations in creating a Manhattan plot displaying the strength of LD and locations of markers across a genome; (2) criteria for genome-wide significance threshold and the different appearance of Manhattan plots in single-locus and multiple-locus models; (3) exploration of population structure and kinship among individuals; (4) quantile-quantile (QQ) plot; (5) LD decay across the genome and LD between the associated markers and their neighbors; (6) exploration of individual and marker information on Manhattan and QQ plots via interactive visualization using HTML. The ultimate objective of this chapter is to help users to connect input data to GWAS outputs to balance power and false positives, and connect GWAS outputs to the selection of candidate genes using LD extent.
随着标记密度的增加,使用连锁分析估计标记与因果突变之间的重组率变得不那么重要。相反,连锁不平衡(LD)成为通过全基因组关联研究(GWAS)进行基因映射的主要指标。除了标记与因果突变之间的连锁之外,许多其他因素也可能导致 LD,包括群体结构和个体之间的隐匿关系。随着统计方法和软件的发展,以提高 GWAS 中的统计能力和计算速度,相应的输出也必须发展,以方便解释输入数据、分析过程和最终的关联结果。在本章中,我们的描述重点是:(1)创建一个曼哈顿图,显示基因组中标记的 LD 强度和位置的考虑因素;(2)全基因组显著性阈值的标准以及单基因座和多基因座模型中曼哈顿图的不同外观;(3)探索个体之间的群体结构和亲缘关系;(4)分位数-分位数(QQ)图;(5)基因组上的 LD 衰减和相关标记与其邻居之间的 LD;(6)通过使用 HTML 进行交互式可视化,在曼哈顿和 QQ 图上探索个体和标记信息。本章的最终目标是帮助用户将输入数据与 GWAS 输出联系起来,以平衡功效和假阳性,并通过 LD 程度将 GWAS 输出与候选基因的选择联系起来。