Médigue C, Rose M, Viari A, Danchin A
Institut Pasteur REG, F-75724 Paris Cedex 15, France. claudine.medigue @snv.jussieu.fr
Genome Res. 1999 Nov;9(11):1116-27. doi: 10.1101/gr.9.11.1116.
During the determination of a DNA sequence, the introduction of artifactual frameshifts and/or in-frame stop codons in putative genes can lead to misprediction of gene products. Detection of such errors with a method based on protein similarity matching is only possible when related sequences are available in databases. Here, we present a method to detect frameshift errors in DNA sequences that is based on the intrinsic properties of the coding sequences. It combines the results of two analyses, the search for translational initiation/termination sites and the prediction of coding regions. This method was used to screen the complete Bacillus subtilis genome sequence and the regions flanking putative errors were resequenced for verification. This procedure allowed us to correct the sequence and to analyze in detail the nature of the errors. Interestingly, in several cases in-frame termination codons or frameshifts were not sequencing errors but confirmed to be present in the chromosome, indicating that the genes are either nonfunctional (pseudogenes) or subject to regulatory processes such as programmed translational frameshifts. The method can be used for checking the quality of the sequences produced by any prokaryotic genome sequencing project.
在确定DNA序列的过程中,推定基因中人为引入的移码和/或框内终止密码子可能导致对基因产物的错误预测。只有当数据库中存在相关序列时,才有可能通过基于蛋白质相似性匹配的方法检测到此类错误。在此,我们提出一种基于编码序列的内在特性来检测DNA序列中移码错误的方法。它结合了两种分析结果,即翻译起始/终止位点的搜索和编码区的预测。该方法用于筛选完整的枯草芽孢杆菌基因组序列,并对推定错误两侧的区域进行重新测序以进行验证。这一过程使我们能够校正序列并详细分析错误的性质。有趣的是,在一些情况下,框内终止密码子或移码并非测序错误,而是被证实在染色体中存在,这表明这些基因要么无功能(假基因),要么受到如程序性翻译移码等调控过程的影响。该方法可用于检查任何原核生物基因组测序项目所产生序列的质量。