Fichant G A, Quentin Y
Institut de Génétique et Microbiologie, Université Paris-Sud, Orsay, France.
Nucleic Acids Res. 1995 Aug 11;23(15):2900-8. doi: 10.1093/nar/23.15.2900.
During the determination of DNA sequences, frameshift errors are not the most frequent but they are the most bothersome as they corrupt the amino acid sequence over several residues. Detection of such errors by sequence alignment is only possible when related sequences are found in the databases. To avoid this limitation, we have developed a new tool based on the distribution of non-overlapping 3-tuples or 6-tuples in the three frames of an ORF. The method relies upon the result of a correspondence analysis. It has been extensively tested on Bacillus subtilis and Saccharomyces cerevisiae sequences and has also been examined with human sequences. The results indicate that it can detect frameshift errors affecting as few as 20 bp with a low rate of false positives (no more than 1.0/1000 bp scanned). The proposed algorithm can be used to scan a large collection of data, but it is mainly intended for laboratory practice as a tool for checking the quality of the sequences produced during a sequencing project.
在确定DNA序列的过程中,移码错误并非最常见的错误类型,但却是最麻烦的,因为它们会破坏多个残基的氨基酸序列。只有在数据库中找到相关序列时,才能通过序列比对检测到此类错误。为避免这一局限性,我们基于开放阅读框(ORF)三个框架中不重叠三联体或六联体的分布开发了一种新工具。该方法依赖于对应分析的结果。它已在枯草芽孢杆菌和酿酒酵母序列上进行了广泛测试,也用人类序列进行了检验。结果表明,它能够检测到影响少至20个碱基对的移码错误,且假阳性率较低(每扫描1000个碱基对不超过1.0个)。所提出的算法可用于扫描大量数据,但它主要作为一种检查测序项目中产生的序列质量的工具用于实验室实践。