比较 Illumina 的 Infinium 全基因组 SNP BeadChips 基因分型算法。

Comparing genotyping algorithms for Illumina's Infinium whole-genome SNP BeadChips.

机构信息

Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia.

出版信息

BMC Bioinformatics. 2011 Mar 8;12:68. doi: 10.1186/1471-2105-12-68.

DOI:10.1186/1471-2105-12-68

PMID:21385424

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3063825/

Abstract

BACKGROUND

Illumina's Infinium SNP BeadChips are extensively used in both small and large-scale genetic studies. A fundamental step in any analysis is the processing of raw allele A and allele B intensities from each SNP into genotype calls (AA, AB, BB). Various algorithms which make use of different statistical models are available for this task. We compare four methods (GenCall, Illuminus, GenoSNP and CRLMM) on data where the true genotypes are known in advance and data from a recently published genome-wide association study.

RESULTS

In general, differences in accuracy are relatively small between the methods evaluated, although CRLMM and GenoSNP were found to consistently outperform GenCall. The performance of Illuminus is heavily dependent on sample size, with lower no call rates and improved accuracy as the number of samples available increases. For X chromosome SNPs, methods with sex-dependent models (Illuminus, CRLMM) perform better than methods which ignore gender information (GenCall, GenoSNP). We observe that CRLMM and GenoSNP are more accurate at calling SNPs with low minor allele frequency than GenCall or Illuminus. The sample quality metrics from each of the four methods were found to have a high level of agreement at flagging samples with unusual signal characteristics.

CONCLUSIONS

CRLMM, GenoSNP and GenCall can be applied with confidence in studies of any size, as their performance was shown to be invariant to the number of samples available. Illuminus on the other hand requires a larger number of samples to achieve comparable levels of accuracy and its use in smaller studies (50 or fewer individuals) is not recommended.

摘要

背景

Illumina 的 Infinium SNP BeadChips 广泛应用于小型和大型遗传研究中。任何分析的基本步骤都是将每个 SNP 的原始等位基因 A 和等位基因 B 强度处理为基因型调用（AA、AB、BB）。为此任务提供了各种利用不同统计模型的算法。我们比较了四种方法（GenCall、Illuminus、GenoSNP 和 CRLMM），一种方法是在已知真实基因型的数据上，另一种方法是在最近发表的全基因组关联研究的数据上。

结果

一般来说，评估的方法之间准确性差异相对较小，尽管 CRLMM 和 GenoSNP 被发现始终优于 GenCall。Illuminus 的性能严重依赖于样本量，随着可用样本数量的增加，无调用率降低，准确性提高。对于 X 染色体 SNP，具有性别依赖模型的方法（Illuminus、CRLMM）比忽略性别信息的方法（GenCall、GenoSNP）表现更好。我们观察到 CRLMM 和 GenoSNP 在调用低次要等位基因频率 SNP 时比 GenCall 或 Illuminus 更准确。这四种方法中的每一种的样本质量指标在标记具有异常信号特征的样本方面具有高度一致性。

结论

CRLMM、GenoSNP 和 GenCall 可以在任何规模的研究中自信地应用，因为它们的性能不受可用样本数量的影响。另一方面，Illuminus 需要更多的样本才能达到可比的准确性水平，不建议在较小的研究（50 或更少的个体）中使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b56/3063825/351e72472347/1471-2105-12-68-1.jpg

相似文献

Comparing genotyping algorithms for Illumina's Infinium whole-genome SNP BeadChips.

BMC Bioinformatics. 2011 Mar 8;12:68. doi: 10.1186/1471-2105-12-68.

KRLMM: an adaptive genotype calling method for common and low frequency variants.

BMC Bioinformatics. 2014 May 23;15:158. doi: 10.1186/1471-2105-15-158.

M(3): an improved SNP calling algorithm for Illumina BeadArray data.

Bioinformatics. 2012 Feb 1;28(3):358-65. doi: 10.1093/bioinformatics/btr673. Epub 2011 Dec 8.

R/Bioconductor software for Illumina's Infinium whole-genome genotyping BeadChips.

Bioinformatics. 2009 Oct 1;25(19):2621-3. doi: 10.1093/bioinformatics/btp470. Epub 2009 Aug 6.

iCall: a genotype-calling algorithm for rare, low-frequency and common variants on the Illumina exome array.

Bioinformatics. 2014 Jun 15;30(12):1714-20. doi: 10.1093/bioinformatics/btu107. Epub 2014 Feb 23.

Comparison of genotype clustering tools with rare variants.

BMC Bioinformatics. 2014 Feb 21;15:52. doi: 10.1186/1471-2105-15-52.

optiCall: a robust genotype-calling algorithm for rare, low-frequency and common variants.

Bioinformatics. 2012 Jun 15;28(12):1598-603. doi: 10.1093/bioinformatics/bts180. Epub 2012 Apr 12.

GenoSNP: a variational Bayes within-sample SNP genotyping algorithm that does not require a reference population.

Bioinformatics. 2008 Oct 1;24(19):2209-14. doi: 10.1093/bioinformatics/btn386. Epub 2008 Jul 24.

Quantifying uncertainty in genotype calls.

Bioinformatics. 2010 Jan 15;26(2):242-9. doi: 10.1093/bioinformatics/btp624. Epub 2009 Nov 11.

Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies.

Am J Hum Genet. 2009 Dec;85(6):847-61. doi: 10.1016/j.ajhg.2009.11.004.

引用本文的文献

Sex Bias in Autoimmunity: New Findings and New Opportunities.

JID Innov. 2025 Jun 20;5(5):100391. doi: 10.1016/j.xjidi.2025.100391. eCollection 2025 Sep.

Understanding Mendelian errors in SNP arrays data using a Gochu Asturcelta pig pedigree: genomic alterations, family size and calling errors.

Sci Rep. 2022 Nov 16;12(1):19686. doi: 10.1038/s41598-022-24340-0.

Gene set enrichment analysis of pathophysiological pathways highlights oxidative stress in psychosis.

Mol Psychiatry. 2022 Dec;27(12):5135-5143. doi: 10.1038/s41380-022-01779-1. Epub 2022 Sep 21.

Establishing analytical validity of BeadChip array genotype data by comparison to whole-genome sequence and standard benchmark datasets.

BMC Med Genomics. 2022 Mar 14;15(1):56. doi: 10.1186/s12920-022-01199-8.

X chromosome genetic data in a Spanish children cohort, dataset description and analysis pipeline.

Sci Data. 2019 Jul 22;6(1):130. doi: 10.1038/s41597-019-0109-3.

Timing and Extent of Inbreeding in African Goats.

Front Genet. 2019 Jun 4;10:537. doi: 10.3389/fgene.2019.00537. eCollection 2019.

SNP genotype calling and quality control for multi-batch-based studies.

Genes Genomics. 2019 Aug;41(8):927-939. doi: 10.1007/s13258-019-00827-5. Epub 2019 May 6.

Effects of X-chromosome Tenomodulin Genetic Variants on Obesity in a Children's Cohort and Implications of the Gene in Adipocyte Metabolism.

Sci Rep. 2019 Mar 8;9(1):3979. doi: 10.1038/s41598-019-40482-0.

Misidentification of runs of homozygosity islands in cattle caused by interference with copy number variation or large intermarker distances.

Genet Sel Evol. 2018 Aug 22;50(1):43. doi: 10.1186/s12711-018-0414-x.

Investigation of common, low-frequency and rare genome-wide variation in anorexia nervosa.

Mol Psychiatry. 2018 May;23(5):1169-1180. doi: 10.1038/mp.2017.88. Epub 2017 Jul 25.

本文引用的文献

A map of human genome variation from population-scale sequencing.

Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.

Saliva-derived DNA performs well in large-scale, high-density single-nucleotide polymorphism microarray studies.

Cancer Epidemiol Biomarkers Prev. 2010 Mar;19(3):794-8. doi: 10.1158/1055-9965.EPI-09-0812. Epub 2010 Mar 3.

Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies.

Am J Hum Genet. 2009 Dec;85(6):847-61. doi: 10.1016/j.ajhg.2009.11.004.

Quantifying uncertainty in genotype calls.

Bioinformatics. 2010 Jan 15;26(2):242-9. doi: 10.1093/bioinformatics/btp624. Epub 2009 Nov 11.

R/Bioconductor software for Illumina's Infinium whole-genome genotyping BeadChips.

Bioinformatics. 2009 Oct 1;25(19):2621-3. doi: 10.1093/bioinformatics/btp470. Epub 2009 Aug 6.

Genome-wide association study identifies new multiple sclerosis susceptibility loci on chromosomes 12 and 20.

Nat Genet. 2009 Jul;41(7):824-8. doi: 10.1038/ng.396. Epub 2009 Jun 14.

Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs.

Nat Genet. 2008 Oct;40(10):1253-60. doi: 10.1038/ng.237. Epub 2008 Sep 7.

GenoSNP: a variational Bayes within-sample SNP genotyping algorithm that does not require a reference population.

Bioinformatics. 2008 Oct 1;24(19):2209-14. doi: 10.1093/bioinformatics/btn386. Epub 2008 Jul 24.

Validation and extension of an empirical Bayes method for SNP calling on Affymetrix microarrays.

Genome Biol. 2008 Apr 3;9(4):R63. doi: 10.1186/gb-2008-9-4-r63.

A navigator for human genome epidemiology.

Nat Genet. 2008 Feb;40(2):124-5. doi: 10.1038/ng0208-124.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

比较 Illumina 的 Infinium 全基因组 SNP BeadChips 基因分型算法。

Comparing genotyping algorithms for Illumina's Infinium whole-genome SNP BeadChips.

机构信息

Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia.