Suppr超能文献

使用270个HapMap样本评估基因分型算法BRLMM对Affymetrix GeneChip Human Mapping 500 K芯片组的批次效应。

Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples.

作者信息

Hong Huixiao, Su Zhenqiang, Ge Weigong, Shi Leming, Perkins Roger, Fang Hong, Xu Joshua, Chen James J, Han Tao, Kaput Jim, Fuscoe James C, Tong Weida

机构信息

Division of Systems Toxicology, National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.

出版信息

BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S17. doi: 10.1186/1471-2105-9-S9-S17.

Abstract

BACKGROUND

Genome-wide association studies (GWAS) aim to identify genetic variants (usually single nucleotide polymorphisms [SNPs]) across the entire human genome that are associated with phenotypic traits such as disease status and drug response. Highly accurate and reproducible genotype calling are paramount since errors introduced by calling algorithms can lead to inflation of false associations between genotype and phenotype. Most genotype calling algorithms currently used for GWAS are based on multiple arrays. Because hundreds of gigabytes (GB) of raw data are generated from a GWAS, the samples are typically partitioned into batches containing subsets of the entire dataset for genotype calling. High call rates and accuracies have been achieved. However, the effects of batch size (i.e., number of chips analyzed together) and of batch composition (i.e., the choice of chips in a batch) on call rate and accuracy as well as the propagation of the effects into significantly associated SNPs identified have not been investigated. In this paper, we analyzed both the batch size and batch composition for effects on the genotype calling algorithm BRLMM using raw data of 270 HapMap samples analyzed with the Affymetrix Human Mapping 500 K array set.

RESULTS

Using data from 270 HapMap samples interrogated with the Affymetrix Human Mapping 500 K array set, three different batch sizes and three different batch compositions were used for genotyping using the BRLMM algorithm. Comparative analysis of the calling results and the corresponding lists of significant SNPs identified through association analysis revealed that both batch size and composition affected genotype calling results and significantly associated SNPs. Batch size and batch composition effects were more severe on samples and SNPs with lower call rates than ones with higher call rates, and on heterozygous genotype calls compared to homozygous genotype calls.

CONCLUSION

Batch size and composition affect the genotype calling results in GWAS using BRLMM. The larger the differences in batch sizes, the larger the effect. The more homogenous the samples in the batches, the more consistent the genotype calls. The inconsistency propagates to the lists of significantly associated SNPs identified in downstream association analysis. Thus, uniform and large batch sizes should be used to make genotype calls for GWAS. In addition, samples of high homogeneity should be placed into the same batch.

摘要

背景

全基因组关联研究(GWAS)旨在识别整个人类基因组中与疾病状态和药物反应等表型特征相关的基因变异(通常是单核苷酸多态性 [SNP])。高度准确且可重复的基因型分型至关重要,因为分型算法引入的错误可能导致基因型与表型之间错误关联的膨胀。目前用于GWAS的大多数基因型分型算法基于多个阵列。由于GWAS会生成数百吉字节(GB)的原始数据,样本通常被分成包含整个数据集子集的批次用于基因型分型。已经实现了高分型率和准确性。然而,批次大小(即一起分析的芯片数量)和批次组成(即批次中芯片的选择)对分型率和准确性的影响以及这些影响在已鉴定的显著相关SNP中的传播尚未得到研究。在本文中,我们使用Affymetrix Human Mapping 500 K阵列集分析的270个HapMap样本的原始数据,分析了批次大小和批次组成对基因型分型算法BRLMM的影响。

结果

使用Affymetrix Human Mapping 500 K阵列集检测的270个HapMap样本的数据,使用BRLMM算法对三种不同的批次大小和三种不同的批次组成进行基因分型。对分型结果和通过关联分析确定的相应显著SNP列表进行比较分析,结果表明批次大小和组成均影响基因型分型结果和显著相关的SNP。与高分型率的样本和SNP相比,批次大小和批次组成对分型率较低的样本和SNP以及杂合基因型分型的影响更为严重。

结论

批次大小和组成会影响使用BRLMM的GWAS中的基因型分型结果。批次大小差异越大,影响越大。批次中的样本越均匀,基因型分型就越一致。这种不一致会传播到下游关联分析中确定的显著相关SNP列表中。因此,应使用统一且大的批次大小进行GWAS的基因型分型。此外,应将高同质性的样本放入同一批次中。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验