Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, NY 10065, USA.
Nucleic Acids Res. 2010 Jun;38(10):3275-86. doi: 10.1093/nar/gkq073. Epub 2010 Feb 15.
The detection of copy number variants (CNV) by array-based platforms provides valuable insight into understanding human diversity. However, suboptimal study design and data processing negatively affect CNV assessment. We quantitatively evaluate their impact when short-sequence oligonucleotide arrays are applied (Affymetrix Genome-Wide Human SNP Array 6.0) by evaluating 42 HapMap samples for CNV detection. Several processing and segmentation strategies are implemented, and results are compared to CNV assessment obtained using an oligonucleotide array CGH platform designed to query CNVs at high resolution (Agilent). We quantitatively demonstrate that different reference models (e.g. single versus pooled sample reference) used to detect CNVs are a major source of inter-platform discrepancy (up to 30%) and that CNVs residing within segmental duplication regions (higher reference copy number) are significantly harder to detect (P < 0.0001). After adjusting Affymetrix data to mimic the Agilent experimental design (reference sample effect), we applied several common segmentation approaches and evaluated differential sensitivity and specificity for CNV detection, ranging 39-77% and 86-100% for non-segmental duplication regions, respectively, and 18-55% and 39-77% for segmental duplications. Our results are relevant to any array-based CNV study and provide guidelines to optimize performance based on study-specific objectives.
基于阵列的平台检测拷贝数变异(CNV)为理解人类多样性提供了有价值的见解。然而,不理想的研究设计和数据处理会对 CNV 评估产生负面影响。我们通过评估 42 个 HapMap 样本中的 CNV 检测,定量评估了在应用短序列寡核苷酸阵列(Affymetrix Genome-Wide Human SNP Array 6.0)时这些因素的影响。我们实施了几种处理和分割策略,并将结果与旨在高分辨率查询 CNV 的寡核苷酸阵列 CGH 平台(Agilent)的 CNV 评估进行了比较。我们定量证明了用于检测 CNV 的不同参考模型(例如,单个样本与混合样本参考)是平台间差异的主要来源(高达 30%),并且位于片段重复区域(较高的参考拷贝数)内的 CNV 更难以检测(P < 0.0001)。在将 Affymetrix 数据调整为模拟 Agilent 实验设计(参考样本效应)后,我们应用了几种常见的分割方法,并评估了针对非片段重复区域的 CNV 检测的差异敏感性和特异性,分别为 39-77%和 86-100%,以及针对片段重复区域的 18-55%和 39-77%。我们的结果与任何基于阵列的 CNV 研究都相关,并为根据特定研究目标优化性能提供了指导。