Suppr超能文献

一种简单快速的双位点质量控制测试,可检测全基因组关联研究中由于批次效应导致的假阳性。

A simple and fast two-locus quality control test to detect false positives due to batch effects in genome-wide association studies.

机构信息

Queensland Institute of Medical Research, Herston, Queensland, Australia.

出版信息

Genet Epidemiol. 2010 Dec;34(8):854-62. doi: 10.1002/gepi.20541.

Abstract

The impact of erroneous genotypes having passed standard quality control (QC) can be severe in genome-wide association studies, genotype imputation, and estimation of heritability and prediction of genetic risk based on single nucleotide polymorphisms (SNP). To detect such genotyping errors, a simple two-locus QC method, based on the difference in test statistic of association between single SNPs and pairs of SNPs, was developed and applied. The proposed approach could detect many problematic SNPs with statistical significance even when standard single SNP QC analyses fail to detect them in real data. Depending on the data set used, the number of erroneous SNPs that were not filtered out by standard single SNP QC but detected by the proposed approach varied from a few hundred to thousands. Using simulated data, it was shown that the proposed method was powerful and performed better than other tested existing methods. The power of the proposed approach to detect erroneous genotypes was ∼80% for a 3% error rate per SNP. This novel QC approach is easy to implement and computationally efficient, and can lead to a better quality of genotypes for subsequent genotype-phenotype investigations.

摘要

在全基因组关联研究、基因型推断以及基于单核苷酸多态性 (SNP) 的遗传力估计和遗传风险预测中,通过标准质量控制 (QC) 的错误基因型的影响可能非常严重。为了检测这种基因分型错误,开发并应用了一种简单的两基因座 QC 方法,该方法基于单 SNP 和 SNP 对之间关联测试统计量的差异。即使在真实数据中标准单 SNP QC 分析未能检测到它们,所提出的方法也可以检测到许多具有统计学意义的有问题的 SNP。根据所用数据集的不同,标准单 SNP QC 未过滤掉但通过所提出的方法检测到的错误 SNP 的数量从几百到几千不等。使用模拟数据表明,所提出的方法功能强大,性能优于其他测试的现有方法。对于每个 SNP 的错误率为 3%,该方法检测错误基因型的功效约为 80%。这种新颖的 QC 方法易于实现且计算效率高,可导致随后进行的基因型-表型研究获得更高质量的基因型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c4b/3674525/637edf95fc26/gepi0034-0854-f1.jpg

相似文献

3
A quality control algorithm for filtering SNPs in genome-wide association studies.
Bioinformatics. 2010 Jul 15;26(14):1731-7. doi: 10.1093/bioinformatics/btq272. Epub 2010 May 25.
4
Prioritize and select SNPs for association studies with multi-stage designs.
J Comput Biol. 2008 Apr;15(3):241-57. doi: 10.1089/cmb.2007.0090.
5
SNP genotype calling and quality control for multi-batch-based studies.
Genes Genomics. 2019 Aug;41(8):927-939. doi: 10.1007/s13258-019-00827-5. Epub 2019 May 6.
6
Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy.
Hum Genet. 2013 May;132(5):509-22. doi: 10.1007/s00439-013-1266-7. Epub 2013 Jan 22.
7
Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering.
BMC Bioinformatics. 2014 Apr 10;15:102. doi: 10.1186/1471-2105-15-102.
8
Analysis of untyped SNPs: maximum likelihood and imputation methods.
Genet Epidemiol. 2010 Dec;34(8):803-15. doi: 10.1002/gepi.20527.
9
HiSSI: high-order SNP-SNP interactions detection based on efficient significant pattern and differential evolution.
BMC Med Genomics. 2019 Dec 30;12(Suppl 7):139. doi: 10.1186/s12920-019-0584-6.
10
GWIS--model-free, fast and exhaustive search for epistatic interactions in case-control GWAS.
BMC Genomics. 2013;14 Suppl 3(Suppl 3):S10. doi: 10.1186/1471-2164-14-S3-S10. Epub 2013 May 28.

引用本文的文献

1
Elevated risk of invasive group A streptococcal disease and host genetic variation in the human leucocyte antigen locus.
Genes Immun. 2020 Jan;21(1):63-70. doi: 10.1038/s41435-019-0082-z. Epub 2019 Aug 29.
2
Local Joint Testing Improves Power and Identifies Hidden Heritability in Association Studies.
Genetics. 2016 Jul;203(3):1105-16. doi: 10.1534/genetics.116.188292. Epub 2016 May 6.
3
Shared genetics underlying epidemiological association between endometriosis and ovarian cancer.
Hum Mol Genet. 2015 Oct 15;24(20):5955-64. doi: 10.1093/hmg/ddv306. Epub 2015 Jul 30.
4
Most common 'sporadic' cancers have a significant germline genetic component.
Hum Mol Genet. 2014 Nov 15;23(22):6112-8. doi: 10.1093/hmg/ddu312. Epub 2014 Jun 18.
5
Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data.
J Data Mining Genomics Proteomics. 2013 Oct 20;4. doi: 10.4172/2153-0602.1000143.
6
GWIS--model-free, fast and exhaustive search for epistatic interactions in case-control GWAS.
BMC Genomics. 2013;14 Suppl 3(Suppl 3):S10. doi: 10.1186/1471-2164-14-S3-S10. Epub 2013 May 28.
8
Estimating the genetic variance of major depressive disorder due to all single nucleotide polymorphisms.
Biol Psychiatry. 2012 Oct 15;72(8):707-9. doi: 10.1016/j.biopsych.2012.03.011. Epub 2012 Apr 19.
9
Impact of diagnostic misclassification on estimation of genetic correlations using genome-wide genotypes.
Eur J Hum Genet. 2012 Jun;20(6):668-74. doi: 10.1038/ejhg.2011.257. Epub 2012 Jan 18.
10
Estimating missing heritability for disease from genome-wide association studies.
Am J Hum Genet. 2011 Mar 11;88(3):294-305. doi: 10.1016/j.ajhg.2011.02.002. Epub 2011 Mar 3.

本文引用的文献

1
A variant in LIN28B is associated with 2D:4D finger-length ratio, a putative retrospective biomarker of prenatal testosterone exposure.
Am J Hum Genet. 2010 Apr 9;86(4):519-25. doi: 10.1016/j.ajhg.2010.02.017. Epub 2010 Mar 18.
2
Quantitative trait loci for CD4:CD8 lymphocyte ratio are associated with risk of type 1 diabetes and HIV-1 immune control.
Am J Hum Genet. 2010 Jan;86(1):88-92. doi: 10.1016/j.ajhg.2009.12.008. Epub 2009 Dec 31.
3
Finding the missing heritability of complex diseases.
Nature. 2009 Oct 8;461(7265):747-53. doi: 10.1038/nature08494.
5
Missing call bias in high-throughput genotyping.
BMC Genomics. 2009 Mar 13;10:106. doi: 10.1186/1471-2164-10-106.
7
Predicting unobserved phenotypes for complex traits from whole-genome SNP data.
PLoS Genet. 2008 Oct;4(10):e1000231. doi: 10.1371/journal.pgen.1000231. Epub 2008 Oct 24.
8
Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies.
PLoS Genet. 2008 Jul 25;4(7):e1000130. doi: 10.1371/journal.pgen.1000130.
9
Sizing up human height variation.
Nat Genet. 2008 May;40(5):489-90. doi: 10.1038/ng0508-489.
10
How to interpret a genome-wide association study.
JAMA. 2008 Mar 19;299(11):1335-44. doi: 10.1001/jama.299.11.1335.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验