Suppr超能文献

KRLMM:一种针对常见和低频变异的自适应基因型分型方法。

KRLMM: an adaptive genotype calling method for common and low frequency variants.

作者信息

Liu Ruijie, Dai Zhiyin, Yeager Meredith, Irizarry Rafael A, Ritchie Matthew E

机构信息

Molecular Medicine Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia.

出版信息

BMC Bioinformatics. 2014 May 23;15:158. doi: 10.1186/1471-2105-15-158.

Abstract

BACKGROUND

SNP genotyping microarrays have revolutionized the study of complex disease. The current range of commercially available genotyping products contain extensive catalogues of low frequency and rare variants. Existing SNP calling algorithms have difficulty dealing with these low frequency variants, as the underlying models rely on each genotype having a reasonable number of observations to ensure accurate clustering.

RESULTS

Here we develop KRLMM, a new method for converting raw intensities into genotype calls that aims to overcome this issue. Our method is unique in that it applies careful between sample normalization and allows a variable number of clusters k (1, 2 or 3) for each SNP, where k is predicted using the available data. We compare our method to four genotyping algorithms (GenCall, GenoSNP, Illuminus and OptiCall) on several Illumina data sets that include samples from the HapMap project where the true genotypes are known in advance. All methods were found to have high overall accuracy (> 98%), with KRLMM consistently amongst the best. At low minor allele frequency, the KRLMM, OptiCall and GenoSNP algorithms were observed to be consistently more accurate than GenCall and Illuminus on our test data.

CONCLUSIONS

Methods that tailor their approach to calling low frequency variants by either varying the number of clusters (KRLMM) or using information from other SNPs (OptiCall and GenoSNP) offer improved accuracy over methods that do not (GenCall and Illuminus). The KRLMM algorithm is implemented in the open-source crlmm package distributed via the Bioconductor project (http://www.bioconductor.org).

摘要

背景

单核苷酸多态性(SNP)基因分型微阵列彻底改变了复杂疾病的研究。当前市面上可买到的基因分型产品涵盖了大量低频和罕见变异的目录。现有的SNP分型算法难以处理这些低频变异,因为其基础模型依赖于每个基因型有合理数量的观测值以确保准确聚类。

结果

在此,我们开发了KRLMM,一种将原始强度转换为基因型分型的新方法,旨在克服这一问题。我们的方法独特之处在于它进行了仔细的样本间归一化,并允许每个SNP有可变数量的聚类k(1、2或3),其中k是根据可用数据预测的。我们在几个Illumina数据集上,将我们的方法与四种基因分型算法(GenCall、GenoSNP、Illuminus和OptiCall)进行比较,这些数据集包括来自HapMap项目的样本,其真实基因型是预先已知的。所有方法的总体准确率都很高(>98%),KRLMM始终名列前茅。在低次要等位基因频率下,在我们的测试数据中,观察到KRLMM、OptiCall和GenoSNP算法始终比GenCall和Illuminus更准确。

结论

通过改变聚类数量(KRLMM)或使用来自其他SNP的信息(OptiCall和GenoSNP)来调整其方法以对低频变异进行分型的方法,比不这样做的方法(GenCall和Illuminus)具有更高的准确性。KRLMM算法在通过Bioconductor项目(http://www.bioconductor.org)分发的开源crlmm包中实现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5dc/4064501/529032a73baa/1471-2105-15-158-1.jpg

相似文献

1
KRLMM: an adaptive genotype calling method for common and low frequency variants.
BMC Bioinformatics. 2014 May 23;15:158. doi: 10.1186/1471-2105-15-158.
2
Comparing genotyping algorithms for Illumina's Infinium whole-genome SNP BeadChips.
BMC Bioinformatics. 2011 Mar 8;12:68. doi: 10.1186/1471-2105-12-68.
3
iCall: a genotype-calling algorithm for rare, low-frequency and common variants on the Illumina exome array.
Bioinformatics. 2014 Jun 15;30(12):1714-20. doi: 10.1093/bioinformatics/btu107. Epub 2014 Feb 23.
4
optiCall: a robust genotype-calling algorithm for rare, low-frequency and common variants.
Bioinformatics. 2012 Jun 15;28(12):1598-603. doi: 10.1093/bioinformatics/bts180. Epub 2012 Apr 12.
5
M(3): an improved SNP calling algorithm for Illumina BeadArray data.
Bioinformatics. 2012 Feb 1;28(3):358-65. doi: 10.1093/bioinformatics/btr673. Epub 2011 Dec 8.
6
Comparison of genotype clustering tools with rare variants.
BMC Bioinformatics. 2014 Feb 21;15:52. doi: 10.1186/1471-2105-15-52.
7
A new genotype calling method for affymetrix SNP arrays.
J Bioinform Comput Biol. 2011 Dec;9(6):715-28. doi: 10.1142/s0219720011005458.
8
A multi-array multi-SNP genotyping algorithm for Affymetrix SNP microarrays.
Bioinformatics. 2007 Jun 15;23(12):1459-67. doi: 10.1093/bioinformatics/btm131. Epub 2007 Apr 25.
9
M(3)-S: a genotype calling method incorporating information from samples with known genotypes.
BMC Bioinformatics. 2015 Dec 3;16:403. doi: 10.1186/s12859-015-0824-5.
10
R/Bioconductor software for Illumina's Infinium whole-genome genotyping BeadChips.
Bioinformatics. 2009 Oct 1;25(19):2621-3. doi: 10.1093/bioinformatics/btp470. Epub 2009 Aug 6.

引用本文的文献

1
Benefits and limitations of genome-wide association studies.
Nat Rev Genet. 2019 Aug;20(8):467-484. doi: 10.1038/s41576-019-0127-1.

本文引用的文献

1
illuminaio: An open source IDAT parsing tool for Illumina microarrays.
F1000Res. 2013 Dec 4;2:264. doi: 10.12688/f1000research.2-264.v1. eCollection 2013.
2
zCall: a rare variant caller for array-based genotyping: genetics and population analysis.
Bioinformatics. 2012 Oct 1;28(19):2543-5. doi: 10.1093/bioinformatics/bts479. Epub 2012 Jul 27.
4
optiCall: a robust genotype-calling algorithm for rare, low-frequency and common variants.
Bioinformatics. 2012 Jun 15;28(12):1598-603. doi: 10.1093/bioinformatics/bts180. Epub 2012 Apr 12.
5
Improved imputation of common and uncommon SNPs with a new reference set.
Nat Genet. 2011 Dec 27;44(1):6-7. doi: 10.1038/ng.1044.
6
M(3): an improved SNP calling algorithm for Illumina BeadArray data.
Bioinformatics. 2012 Feb 1;28(3):358-65. doi: 10.1093/bioinformatics/btr673. Epub 2011 Dec 8.
7
Comparing genotyping algorithms for Illumina's Infinium whole-genome SNP BeadChips.
BMC Bioinformatics. 2011 Mar 8;12:68. doi: 10.1186/1471-2105-12-68.
8
A map of human genome variation from population-scale sequencing.
Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.
9
Variability in GWAS analysis: the impact of genotype calling algorithm inconsistencies.
Pharmacogenomics J. 2010 Aug;10(4):324-35. doi: 10.1038/tpj.2010.46.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验