用于具有移码错误的全长cDNA序列的氨基酸翻译程序。

Amino acid translation program for full-length cDNA sequences with frameshift errors.

作者信息

Fukunishi Y, Hayashizaki Y

机构信息

Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokahama Institute, Yokohama City, Kanagawa 230-0045, Japan.

出版信息

Physiol Genomics. 2001 Mar 8;5(2):81-7. doi: 10.1152/physiolgenomics.2001.5.2.81.

DOI:10.1152/physiolgenomics.2001.5.2.81

PMID:11242592

Abstract

Here we present an amino acid translation program designed to suggest the position of experimental frameshift errors and predict amino acid sequences for full-length cDNA sequences having phred scores. Our program generates artificial insertions into artificial deletions from low-accuracy positions of the original sequence, thereby generating many candidate sequences. The validity of the most probable sequence (the likelihood that it represents the actual protein) is evaluated by using a score (V(a)) that is calculated in light of the Kozak consensus, preferred codon usage, and position of the initiation codon. To evaluate the software, we have used a database in which, out of 612 cDNA sequences, 524 (86%) carried 773 frameshift errors in the coding sequence. Our software detected and corrected 48% of the total frameshift errors in 62% of the total cDNA sequences with frameshift errors. The false positive rate of frameshift correction was 9%, and 91% of the suggested frameshifts were true.

摘要

在此，我们展示了一个氨基酸翻译程序，该程序旨在指出实验性移码错误的位置，并预测具有phred分数的全长cDNA序列的氨基酸序列。我们的程序从原始序列的低准确性位置生成人工插入或人工缺失，从而产生许多候选序列。通过使用一个分数（V(a)）来评估最可能序列的有效性（即它代表实际蛋白质的可能性），该分数是根据科扎克共有序列、偏好的密码子使用情况以及起始密码子的位置计算得出的。为了评估该软件，我们使用了一个数据库，在612个cDNA序列中，有524个（86%）在编码序列中存在773个移码错误。我们的软件在62%存在移码错误的cDNA序列中检测并纠正了48%的总移码错误。移码校正的假阳性率为9%，并且所建议的移码中有91%是正确的。