Lord Jenny, Turton James, Medway Christopher, Shi Hui, Brown Kristelle, Lowe James, Mann David, Pickering-Brown Stuart, Kalsheker Noor, Passmore Peter, Morgan Kevin
Human Genetics, School of Molecular Medical Sciences, Queens Medical Centre, University of Nottingham Nottingham, UK.
Int J Mol Epidemiol Genet. 2012;3(4):262-75. Epub 2012 Nov 15.
CLU, PICALM and CR1 were identified as genetic risk factors for late onset Alzheimer's disease (AD) in two large genome wide association studies (GWAS) published in 2009, but the variants that convey this alteration in disease risk, and how the genes relate to AD pathology is yet to be discovered. A next generation sequencing (NGS) project was conducted targeting CLU, CR1 and PICALM, in 96 AD samples (8 pools of 12), in an attempt to discover rare variants within these AD associated genes. Inclusion of repetitive regions in the design of the SureSelect capture lead to significant issues in alignment of the data, leading to poor specificity and a lower than expected depth of coverage. A strong positive correlation (0.964, p<0.001) was seen between NGS and 1000 genome project frequency estimates. Of the ~170 "novel" variants detected in the genes, seven SNPs, all of which were present in multiple sample pools, were selected for validation by Sanger sequencing. Two SNPs were successfully validated by this method, and shown to be genuine variants, while five failed validation. These spurious SNP calls occurred as a result of the presence of small indels and mononucleotide repeats, indicating such features should be regarded with caution, and validation via an independent method is important for NGS variant calls.
在2009年发表的两项大型全基因组关联研究(GWAS)中,CLU、PICALM和CR1被确定为晚发性阿尔茨海默病(AD)的遗传风险因素,但传递疾病风险改变的变异以及这些基因与AD病理学的关系尚未被发现。针对CLU、CR1和PICALM开展了一项新一代测序(NGS)项目,对96个AD样本(8个样本池,每个样本池12个样本)进行检测,试图在这些与AD相关的基因中发现罕见变异。SureSelect捕获设计中包含重复区域导致数据比对出现重大问题,导致特异性差且覆盖深度低于预期。在NGS与千人基因组计划频率估计之间观察到强正相关(0.964,p<0.001)。在这些基因中检测到的约170个“新”变异中,选择了7个单核苷酸多态性(SNP)进行桑格测序验证,所有这些SNP都存在于多个样本池中。通过这种方法成功验证了两个SNP,并证明它们是真正的变异,而另外五个未能验证。这些虚假的SNP调用是由于存在小的插入缺失和单核苷酸重复导致的,这表明应谨慎对待此类特征,并且通过独立方法进行验证对于NGS变异调用很重要。