Department of Health Sciences, University of Leicester, Leicester, United Kingdom.
PLoS One. 2009 Dec 4;4(12):e8175. doi: 10.1371/journal.pone.0008175.
The genetic contribution to sporadic amyotrophic lateral sclerosis (ALS) has not been fully elucidated. There are increasing efforts to characterise the role of copy number variants (CNVs) in human diseases; two previous studies concluded that CNVs may influence risk of sporadic ALS, with multiple rare CNVs more important than common CNVs. A little-explored issue surrounding genome-wide CNV association studies is that of post-calling filtering and merging of raw CNV calls. We undertook simulations to define filter thresholds and considered optimal ways of merging overlapping CNV calls for association testing, taking into consideration possibly overlapping or nested, but distinct, CNVs and boundary estimation uncertainty.
In this study we screened Illumina 300K SNP genotyping data from 730 ALS cases and 789 controls for copy number variation. Following quality control filters using thresholds defined by simulation, a total of 11321 CNV calls were made across 575 cases and 621 controls. Using region-based and gene-based association analyses, we identified several loci showing nominally significant association. However, the choice of criteria for combining calls for association testing has an impact on the ranking of the results by their significance. Several loci which were previously reported as being associated with ALS were identified here. However, of another 15 genes previously reported as exhibiting ALS-specific copy number variation, only four exhibited copy number variation in this study. Potentially interesting novel loci, including EEF1D, a translation elongation factor involved in the delivery of aminoacyl tRNAs to the ribosome (a process which has previously been implicated in genetic studies of spinal muscular atrophy) were identified but must be treated with caution due to concerns surrounding genomic location and platform suitability.
Interpretation of CNV association findings must take into account the effects of filtering and combining CNV calls when based on early genome-wide genotyping platforms and modest study sizes.
散发性肌萎缩侧索硬化症(ALS)的遗传贡献尚未完全阐明。越来越多的人致力于描述拷贝数变异(CNVs)在人类疾病中的作用;两项先前的研究得出结论,CNVs 可能会影响散发性 ALS 的风险,多个罕见的 CNVs 比常见的 CNVs 更为重要。在全基因组 CNV 关联研究中,一个尚未充分探讨的问题是原始 CNV 调用的调用后过滤和合并。我们进行了模拟,以定义过滤阈值,并考虑了合并重叠 CNV 调用的最佳方法,以进行关联测试,同时考虑了可能重叠或嵌套但不同的 CNVs 和边界估计不确定性。
在这项研究中,我们筛选了 730 例 ALS 病例和 789 例对照的 Illumina 300K SNP 基因分型数据,以检测拷贝数变异。根据模拟定义的阈值进行质量控制过滤后,在 575 例病例和 621 例对照中总共获得了 11321 个 CNV 调用。使用基于区域和基于基因的关联分析,我们确定了几个表现出名义显著关联的位点。然而,用于组合调用以进行关联测试的标准的选择会影响结果的显著性排名。这里确定了先前报道与 ALS 相关的几个位点。然而,在这项研究中,先前报道与 ALS 特异性拷贝数变异相关的 15 个基因中,只有 4 个显示出拷贝数变异。确定了一些潜在有趣的新位点,包括 EEF1D,这是一种参与将氨酰 tRNA 递送到核糖体的翻译延伸因子(该过程先前已被牵连到对脊髓性肌萎缩症的遗传研究中),但由于对基因组位置和平台适用性的担忧,必须谨慎对待。
基于早期全基因组基因分型平台和适度的研究规模,解释 CNV 关联发现时必须考虑过滤和合并 CNV 调用的影响。