Scharpf Robert B, Irizarry Rafael A, Ritchie Matthew E, Carvalho Benilton, Ruczinski Ingo
Department of Oncology, Johns Hopkins University School of Medicine, 550 N. Broadway, Suite 1103, Baltimore, MD 21218, United States of America.
J Stat Softw. 2011 May 1;40(12):1-32.
Genotyping platforms such as Affymetrix can be used to assess genotype-phenotype as well as copy number-phenotype associations at millions of markers. While genotyping algorithms are largely concordant when assessed on HapMap samples, tools to assess copy number changes are more variable and often discordant. One explanation for the discordance is that copy number estimates are susceptible to systematic differences between groups of samples that were processed at different times or by different labs. Analysis algorithms that do not adjust for batch effects are prone to spurious measures of association. The R package crlmm implements a multilevel model that adjusts for batch effects and provides allele-specific estimates of copy number. This paper illustrates a workflow for the estimation of allele-specific copy number and integration of the marker-level estimates with complimentary Bioconductor software for inferring regions of copy number gain or loss. All analyses are performed in the statistical environment R.
诸如Affymetrix这样的基因分型平台可用于评估数百万个标记处的基因型-表型以及拷贝数-表型关联。虽然在HapMap样本上评估时基因分型算法在很大程度上是一致的,但评估拷贝数变化的工具更具变异性且常常不一致。这种不一致的一个解释是,拷贝数估计容易受到在不同时间或由不同实验室处理的样本组之间系统差异的影响。未针对批次效应进行调整的分析算法容易出现虚假的关联度量。R包crlmm实现了一个针对批次效应进行调整的多级模型,并提供等位基因特异性的拷贝数估计。本文阐述了一个用于估计等位基因特异性拷贝数以及将标记水平估计与用于推断拷贝数增加或减少区域的免费Bioconductor软件进行整合的工作流程。所有分析均在统计环境R中进行。