全基因组关联研究：质量控制和基于人群的措施。

Genome-wide association studies: quality control and population-based measures.

机构信息

Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Germany.

出版信息

Genet Epidemiol. 2009;33 Suppl 1(Suppl 1):S45-50. doi: 10.1002/gepi.20472.

DOI:10.1002/gepi.20472

PMID:19924716

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2996103/

Abstract

Genome-wide association studies, using hundreds of thousands of single-nucleotide polymorphism (SNP) markers, have become a standard approach for identifying disease susceptibility genes. The change in the technology poses substantial computational and statistical challenges that have been addressed in the quality control, imputation, and population-based measure groups of the Genetic Analysis Workshop 16. The computational challenges pertain to efficient memory management and computational speed of the statistical procedures, and we discuss an approach for efficient SNP storage. Accuracy and computational speed is relevant for genotype calling, and the results from a comparison of three calling algorithms are discussed. The first statistical challenge is related to statistical quality control, and we discuss two novel quality control procedures. These low-level analyses have an effect on subsequent preparatory steps for high-level analyses, e.g., the quality of genotype imputation approaches. After the conduct of a genome-wide association study with successful replication and/or validation, measures of diagnostic accuracy, including the area under the curve, are investigated. The area under the curve can be constructed from summary data in some situations. Finally, we discuss how the population-attributable risk of a genetic variant that is only measured in a reference data set can be determined.

摘要

全基因组关联研究使用数十万的单核苷酸多态性（SNP）标记，已成为鉴定疾病易感基因的标准方法。技术的变化带来了大量的计算和统计挑战，这些挑战在遗传分析研讨会 16 的质量控制、插补和基于人群的度量组中得到了解决。计算挑战涉及统计过程的有效内存管理和计算速度，我们讨论了一种有效的 SNP 存储方法。准确性和计算速度与基因型调用相关，我们讨论了三种调用算法的比较结果。第一个统计挑战与统计质量控制有关，我们讨论了两种新的质量控制程序。这些低层次的分析对后续的高级分析准备步骤有影响，例如基因型插补方法的质量。在进行全基因组关联研究并成功复制和/或验证后，会研究诊断准确性的度量，包括曲线下面积。在某些情况下，可以从汇总数据构建曲线下面积。最后，我们讨论了如何确定仅在参考数据集测量的遗传变异的人群归因风险。

相似文献

Genome-wide association studies: quality control and population-based measures.全基因组关联研究：质量控制和基于人群的措施。

Genet Epidemiol. 2009;33 Suppl 1(Suppl 1):S45-50. doi: 10.1002/gepi.20472.

Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies.未分型标记的全基因组推断准确性及其对关联研究统计效能的影响。

BMC Genet. 2009 Jun 16;10:27. doi: 10.1186/1471-2156-10-27.

A simple and fast two-locus quality control test to detect false positives due to batch effects in genome-wide association studies.一种简单快速的双位点质量控制测试，可检测全基因组关联研究中由于批次效应导致的假阳性。

Genet Epidemiol. 2010 Dec;34(8):854-62. doi: 10.1002/gepi.20541.

Comparison among three variant callers and assessment of the accuracy of imputation from SNP array data to whole-genome sequence level in chicken.鸡中三种变异检测工具的比较以及从SNP芯片数据到全基因组序列水平的填充准确性评估。

BMC Genomics. 2015 Oct 21;16:824. doi: 10.1186/s12864-015-2059-2.

Use of longitudinal data in genetic studies in the genome-wide association studies era: summary of Group 14.全基因组关联研究时代遗传研究中纵向数据的使用：第 14 组总结。

Genet Epidemiol. 2009;33 Suppl 1(Suppl 1):S93-8. doi: 10.1002/gepi.20479.

Accuracy of imputation using the most common sires as reference population in layer chickens.以最常见父系作为参考群体对蛋鸡进行基因填充的准确性。

BMC Genet. 2015 Aug 18;16:101. doi: 10.1186/s12863-015-0253-5.

FAPI: Fast and accurate P-value Imputation for genome-wide association study.FAPI：用于全基因组关联研究的快速准确P值估算

Eur J Hum Genet. 2016 May;24(5):761-6. doi: 10.1038/ejhg.2015.190. Epub 2015 Aug 26.

Quality control and integration of genotypes from two calling pipelines for whole genome sequence data in the Alzheimer's disease sequencing project.全基因组序列数据阿尔茨海默病测序项目中两种调用管道基因型的质量控制和整合。

Genomics. 2019 Jul;111(4):808-818. doi: 10.1016/j.ygeno.2018.05.004. Epub 2018 May 29.

Genotype imputation to increase sample size in pedigreed populations.通过基因型填充增加家系群体的样本量。

Methods Mol Biol. 2013;1019:395-410. doi: 10.1007/978-1-62703-447-0_17.

Missing data imputation and haplotype phase inference for genome-wide association studies.全基因组关联研究中的缺失数据插补与单倍型相位推断

Hum Genet. 2008 Dec;124(5):439-50. doi: 10.1007/s00439-008-0568-7. Epub 2008 Oct 11.

引用本文的文献

A systematic review of analytical methods used in genetic association analysis of the X-chromosome.X 染色体遗传关联分析中分析方法的系统评价

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac287.

Common genetic variation in obesity, lipid transfer genes and risk of Metabolic Syndrome: Results from IDEFICS/I.Family study and meta-analysis.肥胖、脂质转移基因中的常见遗传变异与代谢综合征风险：IDEFICS/I.Family 研究和荟萃分析的结果。

Sci Rep. 2020 Apr 28;10(1):7189. doi: 10.1038/s41598-020-64031-2.

A robust test for X-chromosome genetic association accounting for X-chromosome inactivation and imprinting.一种稳健的 X 染色体遗传关联测试方法，可考虑 X 染色体失活和印迹。

Genet Res (Camb). 2020 Apr 1;102:e2. doi: 10.1017/S0016672320000026.

Gene Influence in the Effectiveness of Plant Sterols Treatment in Children: Pilot Interventional Study.基因对植物固醇治疗儿童有效性的影响：初步干预研究。

Nutrients. 2019 Oct 21;11(10):2538. doi: 10.3390/nu11102538.

X chromosome genetic data in a Spanish children cohort, dataset description and analysis pipeline.西班牙儿童队列的 X 染色体遗传数据，数据集描述和分析流程。

Sci Data. 2019 Jul 22;6(1):130. doi: 10.1038/s41597-019-0109-3.

Modulation of plasma triglycerides concentration by sterol-based treatment in children carrying different genes.基于甾醇的治疗对携带不同基因儿童血浆甘油三酯浓度的调节作用。

Ann Pediatr Cardiol. 2019 May-Aug;12(2):83-89. doi: 10.4103/apc.APC_86_18.

X-chromosome association study reveals genetic susceptibility loci of nasopharyngeal carcinoma.X 染色体关联研究揭示鼻咽癌的遗传易感性位点。

Biol Sex Differ. 2019 Mar 25;10(1):13. doi: 10.1186/s13293-019-0227-9.

Genomic Influence in the Prevention of Cardiovascular Diseases with a Sterol-Based Treatment.基于甾醇治疗预防心血管疾病的基因组学影响

Diseases. 2018 Apr 3;6(2):24. doi: 10.3390/diseases6020024.

The role of a FADS1 polymorphism in the association of fatty acid blood levels, BMI and blood pressure in young children-Analyses based on path models.FADS1基因多态性在幼儿脂肪酸血液水平、体重指数和血压关联中的作用——基于路径模型的分析

PLoS One. 2017 Jul 21;12(7):e0181485. doi: 10.1371/journal.pone.0181485. eCollection 2017.

Genome-Enabled Prediction of Breeding Values for Feedlot Average Daily Weight Gain in Nelore Cattle.基于基因组预测内罗牛育肥期平均日增重的育种值

G3 (Bethesda). 2017 Jun 7;7(6):1855-1859. doi: 10.1534/g3.117.041442.

本文引用的文献

Look who is calling: a comparison of genotype calling algorithms.看看是谁在呼叫：基因型呼叫算法的比较

BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S59. doi: 10.1186/1753-6561-3-s7-s59.

ACPA: automated cluster plot analysis of genotype data.ACPA：基因型数据的自动聚类图分析

BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S58. doi: 10.1186/1753-6561-3-s7-s58.

Application of sex-specific single-nucleotide polymorphism filters in genome-wide association data.性别特异性单核苷酸多态性筛选在全基因组关联数据中的应用。

BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S57. doi: 10.1186/1753-6561-3-s7-s57.

Evaluation of an optimal receiver operating characteristic procedure.一种最优接收者操作特征程序的评估。

BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S56. doi: 10.1186/1753-6561-3-s7-s56.

Inference of disease associations with unmeasured genetic variants by combining results from genome-wide association studies with linkage disequilibrium patterns in a reference data set.通过将全基因组关联研究的结果与参考数据集中的连锁不平衡模式相结合，推断疾病与未测量的遗传变异之间的关联。

BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S55. doi: 10.1186/1753-6561-3-s7-s55.

Memory management in genome-wide association studies.全基因组关联研究中的记忆管理

BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S54. doi: 10.1186/1753-6561-3-s7-s54.

Genetics Analysis Workshop 16 Problem 2: the Framingham Heart Study data.遗传分析研讨会16问题2：弗雷明汉心脏研究数据。

BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S3. doi: 10.1186/1753-6561-3-s7-s3.

Data for Genetic Analysis Workshop 16 Problem 1, association analysis of rheumatoid arthritis data.遗传分析研讨会16问题1的数据，类风湿性关节炎数据的关联分析。

BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S2. doi: 10.1186/1753-6561-3-s7-s2.

Genome-wide association studies for discrete traits.全基因组关联研究离散性状。

Genet Epidemiol. 2009;33 Suppl 1(Suppl 1):S8-12. doi: 10.1002/gepi.20465.

Adapting the logical basis of tests for Hardy-Weinberg Equilibrium to the real needs of association studies in human and medical genetics.使 Hardy-Weinberg 平衡检验的逻辑基础适应人类和医学遗传学关联研究的实际需要。

Genet Epidemiol. 2009 Nov;33(7):569-80. doi: 10.1002/gepi.20409.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验