Henikoff S, Wallace J C, Brown J P
Methods Enzymol. 1990;183:111-32. doi: 10.1016/0076-6879(90)83009-x.
In this chapter we describe strategies for the searching of translated nucleotide sequence databases. By applying standard searching techniques developed for protein databases, we have found that previously unrecognized homologies can be detected. In addition, we have shown that extremely high sensitivity can be obtained using the scoring matrix strategy for short regions of similarity. The latter approach is particularly effective for detecting homologs found at the ends of sequences and within data of poor quality. These individual methods are demonstrated for the LysR family of bacterial activator proteins. Successive applications of these methods allow for sensitive detection of complex relationships, as demonstrated for the AraC family and for the complex LuxR-OmpR-NtrC families of bacterial activator proteins. Although our examples are drawn from bacterial sequences, these methods are likewise effective for higher eukaryotic genomic sequences, where protein-coding sequences are usually interrupted by introns. This should be particularly important in the future, since much of the expected increase in nucleotide sequence databases is likely to come from eukaryotic genomic sequencing projects.
在本章中,我们描述了搜索翻译后的核苷酸序列数据库的策略。通过应用为蛋白质数据库开发的标准搜索技术,我们发现可以检测到以前未识别的同源性。此外,我们已经表明,使用评分矩阵策略来处理短相似区域,可以获得极高的灵敏度。后一种方法对于检测序列末端和低质量数据中的同源物特别有效。这些方法分别在细菌激活蛋白的LysR家族中得到了验证。这些方法的连续应用能够灵敏地检测复杂关系,如在AraC家族以及细菌激活蛋白的复杂LuxR-OmpR-NtrC家族中所证明的那样。虽然我们的例子取自细菌序列,但这些方法同样适用于高等真核生物的基因组序列,在这些序列中蛋白质编码序列通常被内含子打断。这在未来应该会特别重要,因为核苷酸序列数据库预期的大量增长可能来自真核生物基因组测序项目。