Department of Life Science, Imperial College London, London, UK.
Bioinformatics. 2010 Nov 1;26(21):2664-71. doi: 10.1093/bioinformatics/btq527. Epub 2010 Sep 15.
Databases of sequenced genomes are widely used to characterize the structure, function and evolutionary relationships of proteins. The ability to discern such relationships is widely expected to grow as sequencing projects provide novel information, bridging gaps in our map of the protein universe.
We have plotted our progress in protein sequencing over the last two decades and found that the rate of novel sequence discovery is in a sustained period of decline. Consequently, PSI-BLAST, the most widely used method to detect remote evolutionary relationships, which relies upon the accumulation of novel sequence data, is now showing a plateau in performance. We interpret this trend as signalling our approach to a representative map of the protein universe and discuss its implications.
已测序基因组数据库被广泛用于描述蛋白质的结构、功能和进化关系。测序项目提供新的信息,填补了我们对蛋白质宇宙的认识空白,人们普遍期望这种关系识别能力会随之增强。
我们已经绘制了过去二十年蛋白质测序的进展情况,发现新序列发现的速度持续下降。因此,PSI-BLAST 是检测远程进化关系最广泛使用的方法,它依赖于新序列数据的积累,现在其性能显示出停滞的趋势。我们将这一趋势解释为表明我们正在接近蛋白质宇宙的代表性图谱,并讨论了其影响。