Weckx Stefan, Del-Favero Jurgen, Rademakers Rosa, Claes Lieve, Cruts Marc, De Jonghe Peter, Van Broeckhoven Christine, De Rijk Peter
Department of Molecular Genetics, Flanders Interuniversity Institute for Biotechnology, University of Antwerp, Antwerpen, Belgium.
Genome Res. 2005 Mar;15(3):436-42. doi: 10.1101/gr.2754005.
Technological improvements shifted sequencing from low-throughput, work-intensive, gel-based systems to high-throughput capillary systems. This resulted in a broad use of genomic resequencing to identify sequence variations in genes and regulatory, as well as extended genomic regions. We describe a software package, novoSNP, that conscientiously discovers single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (INDELs) in sequence trace files in a fast, reliable, and user-friendly way. We compared the performance of novoSNP with that of PolyPhred and PolyBayes on two data sets. The first data set comprised 1028 sequence trace files obtained from diagnostic mutation analyses of SCN1A (neuronal voltage-gated sodium channel alpha-subunit type I gene). The second data set comprised 9062 sequence trace files from a genomic resequencing project aiming at the construction of a high-density SNP map of MAPT (microtubule-associated protein tau gene). Visual inspection of these data sets had identified 38 sequence variations for SCN1A and 488 for MAPT. novoSNP automatically identified all 38 SCN1A variations including five INDELs, while for MAPT only 15 of the 488 variations were not correctly marked. PolyPhred detected far fewer SNPs as compared to novoSNP and missed nearly all INDELs. PolyBayes, designed for the sequence analysis of cloned templates, detected only a limited number of the variations present in the data set. Besides the significant improvement in the automated detection of sequence variations both in diagnostic mutation analyses and in SNP discovery projects, novoSNP also offers a user-friendly interface for inspecting possible genetic variations.
技术进步使测序从低通量、劳动密集型的基于凝胶的系统转变为高通量毛细管系统。这使得基因组重测序被广泛用于识别基因、调控区域以及扩展基因组区域中的序列变异。我们描述了一个软件包novoSNP,它能以快速、可靠且用户友好的方式在序列追踪文件中切实地发现单核苷酸多态性(SNP)和插入缺失多态性(INDEL)。我们在两个数据集上比较了novoSNP与PolyPhred和PolyBayes的性能。第一个数据集包含从SCN1A(神经元电压门控钠通道α亚基I型基因)的诊断性突变分析中获得的1028个序列追踪文件。第二个数据集包含来自一个旨在构建MAPT(微管相关蛋白tau基因)高密度SNP图谱的基因组重测序项目的9062个序列追踪文件。对这些数据集的目视检查确定了SCN1A的38个序列变异和MAPT的488个序列变异。novoSNP自动识别了所有38个SCN1A变异,包括5个INDEL,而对于MAPT,488个变异中只有15个未被正确标记。与novoSNP相比,PolyPhred检测到的SNP要少得多,并且几乎遗漏了所有INDEL。专为克隆模板的序列分析设计的PolyBayes仅检测到数据集中存在的有限数量的变异。除了在诊断性突变分析和SNP发现项目中自动检测序列变异方面有显著改进外,novoSNP还提供了一个用户友好的界面来检查可能的遗传变异。