Gish W, States D J
National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland 20894-0001.
Nat Genet. 1993 Mar;3(3):266-72. doi: 10.1038/ng0393-266.
Sequence similarity between a translated nucleotide sequence and a known biological protein can provide strong evidence for the presence of a homologous coding region, even between distantly related genes. The computer program BLASTX performed conceptual translation of a nucleotide query sequence followed by a protein database search in one programmatic step. We characterized the sensitivity of BLASTX recognition to the presence of substitution, insertion and deletion errors in the query sequence and to sequence divergence. Reading frames were reliably identified in the presence of 1% query errors, a rate that is typical for primary sequence data. BLASTX is appropriate for use in moderate and large scale sequencing projects at the earliest opportunity, when the data are most prone to containing errors.
翻译后的核苷酸序列与已知生物蛋白质之间的序列相似性,即使在远缘相关基因之间,也能为同源编码区域的存在提供有力证据。计算机程序BLASTX在一个编程步骤中对核苷酸查询序列进行概念性翻译,然后在蛋白质数据库中进行搜索。我们对BLASTX识别查询序列中替换、插入和缺失错误以及序列差异的敏感性进行了表征。在存在1%查询错误的情况下能够可靠地识别阅读框,这一错误率在原始序列数据中很常见。BLASTX适用于在最早阶段、数据最容易包含错误时的中大规模测序项目。