States D J, Gish W
Institute for Biomedical Computing, Washington University, St. Louis, MO 63108, USA.
J Comput Biol. 1994 Spring;1(1):39-50. doi: 10.1089/cmb.1994.1.39.
A computer program called BLASTX was previously shown to be effective in identifying and assigning putative function to likely protein coding regions by detecting significant similarity between a conceptually translated nucleotide query sequence and members of a protein sequence database. We present and assess the sensitivity of a new option to this software tool, herein called BLASTC, which employs information obtained from biases in codon utilization, along with the information obtained from sequence similarity. A rationale for combining these diverse information sources was derived, and analyses of the information available from codon utilization in several species were performed, with wide variation seen. Codon bias information was found on average to improve the sensitivity of detection of short coding regions of human origin by about a factor of 5. The implications of combining information sources on the interpretation of positive findings are discussed.
之前已证明,一个名为BLASTX的计算机程序能够通过检测概念性翻译的核苷酸查询序列与蛋白质序列数据库成员之间的显著相似性,有效地识别可能的蛋白质编码区域并为其赋予假定功能。我们展示并评估了此软件工具的一个新选项(在此称为BLASTC)的灵敏度,该选项利用从密码子使用偏好中获得的信息以及从序列相似性中获得的信息。得出了组合这些不同信息源的基本原理,并对几种物种中密码子使用情况的可用信息进行了分析,发现存在很大差异。平均而言,密码子偏好信息可将检测人类来源短编码区域的灵敏度提高约5倍。讨论了组合信息源对阳性结果解释的影响。