Department of Crop and Animal Sciences, Humboldt-Universität zu Berlin, Invalidenstrasse 42, 10115 Berlin, Germany.
BMC Genomics. 2010 Feb 1;11:80. doi: 10.1186/1471-2164-11-80.
High density genotyping arrays have become established as a valuable research tool in human genetics. Currently, more than 300 genome wide association studies were published for human reporting about 1,000 SNPs that are associated with a phenotype. Also in animal sciences high density genotyping arrays are harnessed to analyse genetic variation. To exploit the full potential of this technology single nucleotide polymorphisms (SNPs) on the chips should be well characterized and their chromosomal position should be precisely known. This, however, is a challenge if the genome sequence is still subject to changes.
We have developed a mapping strategy and a suite of software scripts to update the chromosomal positions of oligomer sequences used for SNP genotyping on high density arrays. We describe the mapping procedure in detail so that scientists with moderate bioinformatics skills can reproduce it. We furthermore present a case study in which we re-mapped 54,001 oligomer sequences from Ilumina's BovineSNP50 beadchip to the bovine genome sequence. We found in 992 cases substantial discrepancies between the manufacturer's annotations and our results. The software scripts in the Perl and R programming languages are provided as supplements.
The positions of oligomer sequences in the genome are volatile even within one build of the genome. To facilitate the analysis of data from a GWAS or from an expression study, especially with species whose genome assembly is still unstable, it is recommended to update the oligomer positions before data analysis.
高密度基因分型芯片已成为人类遗传学中一种有价值的研究工具。目前,已有超过 300 项全基因组关联研究发表,报告了约 1000 个与表型相关的单核苷酸多态性(SNP)。在动物科学中,高密度基因分型芯片也被用于分析遗传变异。为了充分利用这项技术的潜力,芯片上的单核苷酸多态性(SNP)应该得到很好的描述,其染色体位置应该精确知晓。然而,如果基因组序列仍在发生变化,这是一项挑战。
我们开发了一种映射策略和一套软件脚本来更新用于高密度芯片 SNP 基因分型的寡核苷酸序列的染色体位置。我们详细描述了映射过程,以便具有中等生物信息学技能的科学家可以复制它。我们还介绍了一个案例研究,其中我们重新映射了 54001 个来自伊利诺伊州的寡核苷酸序列牛 SNP50 珠芯片到牛基因组序列。我们发现,在 992 个案例中,制造商的注释与我们的结果之间存在实质性差异。Perl 和 R 编程语言中的软件脚本作为补充提供。
即使在一个基因组构建中,寡核苷酸序列在基因组中的位置也是不稳定的。为了促进全基因组关联研究或表达研究数据的分析,特别是对于基因组组装仍不稳定的物种,建议在数据分析之前更新寡核苷酸位置。