Grup de Recerca de Reumatologia, Institut de Recerca de l'Hospital Universitari Vall d'Hebrón (UAB), Barcelona, Spain.
BMC Bioinformatics. 2010 May 19;11:264. doi: 10.1186/1471-2105-11-264.
Understanding the genetic basis of disease risk in depth requires an exhaustive knowledge of the types of genetic variation. Very recently, Copy Number Variants (CNVs) have received much attention because of their potential implication in common disease susceptibility. Copy Number Polymorphisms (CNPs) are of interest as they segregate at an appreciable frequency in the general population (i.e. > 1%) and are potentially implicated in the genetic basis of common diseases.
This paper concerns CNstream, a method for whole-genome CNV discovery and genotyping, using Illumina Beadchip arrays. Compared with other methods, a high level of accuracy was achieved by analyzing the measures of each intensity channel separately and combining information from multiple samples. The CNstream method uses heuristics and parametrical statistics to assign a confidence score to each sample at each probe; the sensitivity of the analysis is increased by jointly calling the copy number state over a set of nearby and consecutive probes. The present method has been tested on a real dataset of 575 samples genotyped using Illumina HumanHap 300 Beadchip, and demonstrates a high correlation with the Database of Genomic Variants (DGV). The same set of samples was analyzed with PennCNV, one of the most frequently used copy number inference methods for Illumina platforms. CNstream was able to identify CNP loci that are not detected by PennCNV and it increased the sensitivity over multiple other loci in the genome.
CNstream is a useful method for the identification and characterization of CNPs using Illumina genotyping microarrays. Compared to the PennCNV method, it has greater sensitivity over multiple CNP loci and allows more powerful statistical analysis in these regions. Therefore, CNstream is a robust CNP analysis tool of use to researchers performing genome-wide association studies (GWAS) on Illumina platforms and aiming to identify CNVs associated with the variables of interest. CNstream has been implemented as an R statistical software package that can work directly from raw intensity files generated from Illumina GWAS projects. The method is available at http://www.urr.cat/cnv/cnstream.html.
深入了解疾病风险的遗传基础需要详尽了解遗传变异的类型。最近,由于其在常见疾病易感性中的潜在影响,拷贝数变异(CNVs)受到了广泛关注。拷贝数多态性(CNPs)很有趣,因为它们在普通人群中以相当高的频率(即>1%)分离,并且可能与常见疾病的遗传基础有关。
本文介绍了 CNstream 方法,这是一种使用 Illumina Beadchip 阵列进行全基因组 CNV 发现和基因分型的方法。与其他方法相比,通过分别分析每个强度通道的测量值并结合多个样本的信息,该方法实现了高水平的准确性。CNstream 方法使用启发式和参数统计方法为每个探针上的每个样本分配置信度得分;通过联合调用一组附近和连续探针上的拷贝数状态来提高分析的灵敏度。该方法已在使用 Illumina HumanHap 300 Beadchip 进行基因分型的 575 个真实样本数据集上进行了测试,并与基因组变异数据库(DGV)高度相关。同一组样本使用 PennCNV 进行了分析,这是 Illumina 平台上最常用的拷贝数推断方法之一。CNstream 能够识别出 PennCNV 无法检测到的 CNP 基因座,并在基因组中的多个其他基因座上提高了灵敏度。
CNstream 是一种使用 Illumina 基因分型微阵列识别和表征 CNPs 的有用方法。与 PennCNV 方法相比,它在多个 CNP 基因座上具有更高的灵敏度,并允许在这些区域进行更强大的统计分析。因此,CNstream 是一种强大的 CNP 分析工具,适用于在 Illumina 平台上进行全基因组关联研究(GWAS)并旨在识别与感兴趣变量相关的 CNV 的研究人员。CNstream 已作为一个 R 统计软件包实现,可以直接从 Illumina GWAS 项目生成的原始强度文件中工作。该方法可在 http://www.urr.cat/cnv/cnstream.html 获得。