• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

全基因组关联研究中基因型数据的质量控制和质量保证。

Quality control and quality assurance in genotypic data for genome-wide association studies.

机构信息

Department of Biostatistics, University of Washington, Seattle, Washington, USA.

出版信息

Genet Epidemiol. 2010 Sep;34(6):591-602. doi: 10.1002/gepi.20516.

DOI:10.1002/gepi.20516
PMID:20718045
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3061487/
Abstract

Genome-wide scans of nucleotide variation in human subjects are providing an increasing number of replicated associations with complex disease traits. Most of the variants detected have small effects and, collectively, they account for a small fraction of the total genetic variance. Very large sample sizes are required to identify and validate findings. In this situation, even small sources of systematic or random error can cause spurious results or obscure real effects. The need for careful attention to data quality has been appreciated for some time in this field, and a number of strategies for quality control and quality assurance (QC/QA) have been developed. Here we extend these methods and describe a system of QC/QA for genotypic data in genome-wide association studies (GWAS). This system includes some new approaches that (1) combine analysis of allelic probe intensities and called genotypes to distinguish gender misidentification from sex chromosome aberrations, (2) detect autosomal chromosome aberrations that may affect genotype calling accuracy, (3) infer DNA sample quality from relatedness and allelic intensities, (4) use duplicate concordance to infer SNP quality, (5) detect genotyping artifacts from dependence of Hardy-Weinberg equilibrium test P-values on allelic frequency, and (6) demonstrate sensitivity of principal components analysis to SNP selection. The methods are illustrated with examples from the "Gene Environment Association Studies" (GENEVA) program. The results suggest several recommendations for QC/QA in the design and execution of GWAS.

摘要

全基因组范围内对人类核苷酸变异的扫描为复杂疾病性状提供了越来越多的可重复关联。大多数检测到的变体具有较小的影响,它们共同仅占总遗传变异的一小部分。需要非常大的样本量才能识别和验证发现。在这种情况下,即使是系统或随机误差的微小来源也可能导致虚假结果或掩盖真实影响。在该领域,人们已经意识到一段时间以来对数据质量的谨慎关注,并且已经开发出许多质量控制和质量保证 (QC/QA) 策略。在这里,我们扩展了这些方法,并描述了一种全基因组关联研究 (GWAS) 中基因型数据的 QC/QA 系统。该系统包括一些新方法,这些方法 (1) 结合等位探针强度和已命名基因型的分析,以区分性别鉴定错误和性染色体异常,(2) 检测可能影响基因型呼叫准确性的常染色体异常,(3) 从相关性和等位基因强度推断 DNA 样本质量,(4) 使用重复一致性来推断 SNP 质量,(5) 从 Hardy-Weinberg 平衡测试 P 值对等位基因频率的依赖性检测基因分型伪影,以及 (6) 展示主成分分析对 SNP 选择的敏感性。该方法通过“基因环境关联研究”(GENEVA) 计划的示例进行说明。结果表明,在 GWAS 的设计和执行中,有几个关于 QC/QA 的建议。

相似文献

1
Quality control and quality assurance in genotypic data for genome-wide association studies.全基因组关联研究中基因型数据的质量控制和质量保证。
Genet Epidemiol. 2010 Sep;34(6):591-602. doi: 10.1002/gepi.20516.
2
Quality control and integration of genotypes from two calling pipelines for whole genome sequence data in the Alzheimer's disease sequencing project.全基因组序列数据阿尔茨海默病测序项目中两种调用管道基因型的质量控制和整合。
Genomics. 2019 Jul;111(4):808-818. doi: 10.1016/j.ygeno.2018.05.004. Epub 2018 May 29.
3
Quality Control Procedures for Genome-Wide Association Studies.全基因组关联研究的质量控制程序。
Curr Protoc. 2022 Nov;2(11):e603. doi: 10.1002/cpz1.603.
4
A simple and fast two-locus quality control test to detect false positives due to batch effects in genome-wide association studies.一种简单快速的双位点质量控制测试,可检测全基因组关联研究中由于批次效应导致的假阳性。
Genet Epidemiol. 2010 Dec;34(8):854-62. doi: 10.1002/gepi.20541.
5
Quality control procedures for genome-wide association studies.全基因组关联研究的质量控制程序。
Curr Protoc Hum Genet. 2011 Jan;Chapter 1:Unit1.19. doi: 10.1002/0471142905.hg0119s68.
6
Testing Hardy-Weinberg proportions in a frequency-matched case-control genetic association study.在频数匹配的病例对照遗传关联研究中检验 Hardy-Weinberg 平衡。
PLoS One. 2011;6(11):e27642. doi: 10.1371/journal.pone.0027642. Epub 2011 Nov 14.
7
Quality control for genome-wide association studies.全基因组关联研究的质量控制
Methods Mol Biol. 2010;628:341-72. doi: 10.1007/978-1-60327-367-1_19.
8
SNP genotype calling and quality control for multi-batch-based studies.基于多批次研究的单核苷酸多态性(SNP)基因分型及质量控制
Genes Genomics. 2019 Aug;41(8):927-939. doi: 10.1007/s13258-019-00827-5. Epub 2019 May 6.
9
A quality control algorithm for filtering SNPs in genome-wide association studies.用于过滤全基因组关联研究中 SNPs 的质量控制算法。
Bioinformatics. 2010 Jul 15;26(14):1731-7. doi: 10.1093/bioinformatics/btq272. Epub 2010 May 25.
10
Quality Control of Common and Rare Variants.常见变异和罕见变异的质量控制
Methods Mol Biol. 2018;1793:25-36. doi: 10.1007/978-1-4939-7868-7_3.

引用本文的文献

1
Gene-by-Environment Interactions Involving Maternal Exposures with Orofacial Cleft Risk in Filipinos.涉及菲律宾人母亲暴露与口面部裂隙风险的基因-环境相互作用。
Genes (Basel). 2025 Jul 25;16(8):876. doi: 10.3390/genes16080876.
2
Pharmacogenomics of steroid-induced ocular hypertension: relationship to high-tension glaucomas and new pathophysiologic insight.类固醇性高眼压症的药物基因组学:与高眼压型青光眼的关系及新的病理生理学见解
medRxiv. 2025 Aug 13:2025.08.11.25333245. doi: 10.1101/2025.08.11.25333245.
3
Differential performance of polygenic risk scores for heart disease in Hispanic/Latino subgroups: Findings of the Hispanic Community Health Study/Study of Latinos.西班牙裔/拉丁裔亚组中心脏病多基因风险评分的差异表现:西班牙裔社区健康研究/拉丁裔研究的结果
HGG Adv. 2025 Jul 28;6(4):100486. doi: 10.1016/j.xhgg.2025.100486.
4
Tracing human genetic histories and natural selection with precise local ancestry inference.通过精确的本地血统推断追溯人类遗传历史和自然选择。
Nat Commun. 2025 May 16;16(1):4576. doi: 10.1038/s41467-025-59936-3.
5
Genome-wide association study and multi-ancestry meta-analysis identify common variants associated with carotid artery intima-media thickness.全基因组关联研究和多血统荟萃分析确定了与颈动脉内膜中层厚度相关的常见变异。
medRxiv. 2025 Apr 14:2025.04.11.25325582. doi: 10.1101/2025.04.11.25325582.
6
Descriptor: .描述符:.
IEEE Data Descr. 2024;2:1-7. doi: 10.1109/ieeedata.2024.3505852. Epub 2024 Nov 26.
7
Genomic Selection for Pea Grain Yield and Protein Content in Italian Environments for Target and Non-Target Genetic Bases.意大利环境下针对目标和非目标遗传基础的豌豆籽粒产量和蛋白质含量的基因组选择
Int J Mol Sci. 2025 Mar 25;26(7):2991. doi: 10.3390/ijms26072991.
8
Rapid evolution of flucytosine resistance in .氟胞嘧啶耐药性的快速演变于……中发生
mSphere. 2025 Apr 29;10(4):e0097724. doi: 10.1128/msphere.00977-24. Epub 2025 Mar 18.
9
Intraplaque haemorrhage quantification and molecular characterisation using attention based multiple instance learning.基于注意力的多实例学习用于斑块内出血定量分析和分子特征描述
medRxiv. 2025 Mar 26:2025.03.04.25323316. doi: 10.1101/2025.03.04.25323316.
10
Determining population structure from k-mer frequencies.从k-mer频率确定群体结构。
PeerJ. 2025 Mar 5;13:e18939. doi: 10.7717/peerj.18939. eCollection 2025.

本文引用的文献

1
The Gene, Environment Association Studies consortium (GENEVA): maximizing the knowledge obtained from GWAS by collaboration across studies of multiple conditions.基因-环境关联研究联盟(GENEVA):通过跨多种疾病研究的合作,最大化从 GWAS 中获得的知识。
Genet Epidemiol. 2010 May;34(4):364-72. doi: 10.1002/gepi.20492.
2
Mechanisms of mosaicism, chimerism and uniparental disomy identified by single nucleotide polymorphism array analysis.通过单核苷酸多态性微阵列分析鉴定嵌合体、嵌合性和单亲二倍体的机制。
Hum Mol Genet. 2010 Apr 1;19(7):1263-75. doi: 10.1093/hmg/ddq003. Epub 2010 Jan 6.
3
Bayesian methods for examining Hardy-Weinberg equilibrium.用于检验哈迪-温伯格平衡的贝叶斯方法。
Biometrics. 2010 Mar;66(1):257-65. doi: 10.1111/j.1541-0420.2009.01267.x. Epub 2009 May 12.
4
Genomewide association studies--illuminating biologic pathways.全基因组关联研究——揭示生物学通路
N Engl J Med. 2009 Apr 23;360(17):1699-701. doi: 10.1056/NEJMp0808934. Epub 2009 Apr 15.
5
Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs.单核苷酸多态性(SNPs)、常见拷贝数多态性和罕见拷贝数变异(CNVs)的整合基因型分型与关联分析。
Nat Genet. 2008 Oct;40(10):1253-60. doi: 10.1038/ng.237. Epub 2008 Sep 7.
6
Genes mirror geography within Europe.基因反映了欧洲内部的地理特征。
Nature. 2008 Nov 6;456(7218):98-101. doi: 10.1038/nature07331. Epub 2008 Aug 31.
7
Appropriate data cleaning methods for genome-wide association study.全基因组关联研究的适当数据清理方法。
J Hum Genet. 2008;53(10):886-893. doi: 10.1007/s10038-008-0322-y. Epub 2008 Aug 12.
8
Population substructure and control selection in genome-wide association studies.全基因组关联研究中的群体亚结构与对照选择
PLoS One. 2008 Jul 2;3(7):e2551. doi: 10.1371/journal.pone.0002551.
9
A HapMap harvest of insights into the genetics of common disease.从HapMap中获取对常见疾病遗传学的深刻见解。
J Clin Invest. 2008 May;118(5):1590-605. doi: 10.1172/JCI34772.
10
Genome-wide association studies for complex traits: consensus, uncertainty and challenges.复杂性状的全基因组关联研究:共识、不确定性与挑战。
Nat Rev Genet. 2008 May;9(5):356-69. doi: 10.1038/nrg2344.