Henikoff J G, Henikoff S
Howard Hughes Medical Institute, Seattle, Washington 98109-1024, USA.
Genome Res. 2000 Apr;10(4):543-6. doi: 10.1101/gr.10.4.543.
A simple and general homology-based method for gene finding was applied to the 2.9-Mb Drosophila melanogaster Adh region, the target sequence of the Genome Annotation Assessment Project (GASP). Each strand of the entire sequence was used as query of the BLOCKS+ database of conserved regions of proteins. This led to functional assignments for more than one-third of the genes and two-thirds of the transposons. Considering the enormous size of the query, the fact that only two false-positive matches were reported emphasizes the high selectivity of protein family-based methods for gene finding. We used the search results to improve BLOCKS+ by identifying compositionally biased blocks. Our results confirm that protein family databases can be used effectively in automated sequence annotation efforts.
一种简单通用的基于同源性的基因查找方法被应用于290万个碱基对的黑腹果蝇乙醇脱氢酶(Adh)区域,该区域是基因组注释评估项目(GASP)的目标序列。整个序列的每条链都被用作蛋白质保守区域的BLOCKS+数据库的查询序列。这为超过三分之一的基因和三分之二的转座子进行了功能分配。考虑到查询序列的巨大规模,仅报告了两个假阳性匹配这一事实强调了基于蛋白质家族的基因查找方法具有很高的选择性。我们利用搜索结果通过识别成分有偏差的区域来改进BLOCKS+。我们的结果证实蛋白质家族数据库可有效地用于自动序列注释工作。