Bioinformatics Research Group in Epidemiology of ISGlobal.
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac043.
Single nucleotide polymorphisms (SNPs) are the most abundant type of genomic variation and the most accessible to genotype in large cohorts. However, they individually explain a small proportion of phenotypic differences between individuals. Ancestry, collective SNP effects, structural variants, somatic mutations or even differences in historic recombination can potentially explain a high percentage of genomic divergence. These genetic differences can be infrequent or laborious to characterize; however, many of them leave distinctive marks on the SNPs across the genome allowing their study in large population samples. Consequently, several methods have been developed over the last decade to detect and analyze different genomic structures using SNP arrays, to complement genome-wide association studies and determine the contribution of these structures to explain the phenotypic differences between individuals. We present an up-to-date collection of available bioinformatics tools that can be used to extract relevant genomic information from SNP array data including population structure and ancestry; polygenic risk scores; identity-by-descent fragments; linkage disequilibrium; heritability and structural variants such as inversions, copy number variants, genetic mosaicisms and recombination histories. From a systematic review of recently published applications of the methods, we describe the main characteristics of R packages, command-line tools and desktop applications, both free and commercial, to help make the most of a large amount of publicly available SNP data.
单核苷酸多态性(SNPs)是基因组变异中最丰富的类型,也是在大样本中进行基因分型最容易的类型。然而,它们各自只能解释个体之间表型差异的一小部分。祖源、集体 SNP 效应、结构变异、体细胞突变,甚至历史上重组的差异,都可能解释基因组差异的很大一部分。这些遗传差异可能很少见或难以描述;然而,它们中的许多在基因组中的 SNP 上留下了独特的印记,允许在大的人群样本中研究它们。因此,在过去十年中,已经开发了几种方法来使用 SNP 阵列检测和分析不同的基因组结构,以补充全基因组关联研究,并确定这些结构对解释个体之间表型差异的贡献。我们提供了一份最新的可用生物信息学工具集合,这些工具可用于从 SNP 阵列数据中提取相关的基因组信息,包括群体结构和祖源;多基因风险评分;亲缘关系一致的片段;连锁不平衡;遗传力和结构变异,如倒位、拷贝数变异、遗传镶嵌和重组历史。通过对最近发表的这些方法应用的系统回顾,我们描述了 R 包、命令行工具和桌面应用程序的主要特征,包括免费和商业的,以帮助充分利用大量公开的 SNP 数据。