Ho Meng-Ru, Tsai Kuo-Wang, Chen Chun-houh, Lin Wen-chang
Institute of Biomedical Informatics, National Yang-Ming University, Taipei 112, Taiwan.
Nucleic Acids Res. 2011 Jan;39(Database issue):D920-5. doi: 10.1093/nar/gkq1197. Epub 2010 Nov 21.
Gene duplications are scattered widely throughout the human genome. A single-base difference located in nearly identical duplicated segments may be misjudged as a single nucleotide polymorphism (SNP) from individuals. This imperfection is undistinguishable in current genotyping methods. As the next-generation sequencing technologies become more popular for sequence-based association studies, numerous ambiguous SNPs are rapidly accumulated. Thus, analyzing duplication variations in the reference genome to assist in preventing false positive SNPs is imperative. We have identified >10% of human genes associated with duplicated gene loci (DGL). Through meticulous sequence alignments of DGL, we systematically designated 1,236,956 variations as duplicated gene nucleotide variants (DNVs). The DNV database (dbDNV) (http://goods.ibms.sinica.edu.tw/DNVs/) has been established to promote more accurate variation annotation. Aside from the flat file download, users can explore the gene-related duplications and the associated DNVs by DGL and DNV searches, respectively. In addition, the dbDNV contains 304,110 DNV-coupled SNPs. From DNV-coupled SNP search, users observe which SNP records are also variants among duplicates. This is useful while ∼58% of exonic SNPs in DGL are DNV-coupled. Because of high accumulation of ambiguous SNPs, we suggest that annotating SNPs with DNVs possibilities should improve association studies of these variants with human diseases.
基因重复在人类基因组中广泛分布。位于几乎相同的重复片段中的单碱基差异可能会被误判为个体的单核苷酸多态性(SNP)。这种缺陷在当前的基因分型方法中无法区分。随着下一代测序技术在基于序列的关联研究中越来越普及,大量模糊的SNP迅速积累。因此,分析参考基因组中的重复变异以协助防止假阳性SNP至关重要。我们已经鉴定出超过10%的人类基因与重复基因座(DGL)相关。通过对DGL进行细致的序列比对,我们系统地将1,236,956个变异指定为重复基因核苷酸变异(DNV)。已经建立了DNV数据库(dbDNV)(http://goods.ibms.sinica.edu.tw/DNVs/)以促进更准确的变异注释。除了平面文件下载外,用户可以分别通过DGL搜索和DNV搜索来探索与基因相关的重复和相关的DNV。此外,dbDNV包含304,110个与DNV耦合的SNP。通过与DNV耦合的SNP搜索,用户可以观察到哪些SNP记录在重复序列中也是变异。这很有用,因为DGL中约58%的外显子SNP与DNV耦合。由于模糊SNP的大量积累,我们建议用DNV可能性注释SNP应该会改善这些变异与人类疾病的关联研究。